python - Calculating Active dates based on gap length using Pandas Dataframes -
i'm relatively new pandas, , trying figure out best way of calculating information is, appreciated. have dataframe looks so:
id activity_date 1 2015-01-01 1 2015-01-02 1 2015-01-03 2 2015-01-02 2 2015-01-05 3 2015-01-10 and want calculate following information "how many days each account active?", understand count information, want apply following restriction, "if there n days between activity dates, count days before gap".
for example, n = 5 following should return count of days active 4, not 6
id activity_date 1 2015-01-01 1 2015-01-02 1 2015-01-04 1 2015-01-06 1 2015-01-14 1 2015-01-15
after understanding want simpler, calculate whether difference between current , previous rows larger 5 days giving boolean series, use filter df , use index value perform slicing:
in [57]: inactive_index = df[df['activity_date'].diff() > pd.timedelta(5, 'd')] inactive_index out[57]: id activity_date 4 1 2015-01-14 in [18]: inactive.index out[18]: int64index([4], dtype='int64') in [58]: df.iloc[:inactive.index[0]] out[58]: id activity_date 0 1 2015-01-01 1 1 2015-01-02 2 1 2015-01-04 3 1 2015-01-06
Comments
Post a Comment