Working with datetime in Python -
i have file has following format:
20150426010203 name1 20150426010303 name2 20150426010307 name3 20150426010409 name1 20150426010503 name4 20150426010510 name1
i interested in finding time differences between appearances of name1 in list , calculating frequency of such appearances (for example, delta time = 1s appeared 20 time, delta time = 30s appeared 1 time etc). second problem how find number of events per minute/hour/day.
i found time differences using
pd.to_datetime(pd.series([time]))
to convert each string datetime format , placed values in list named 'times'. iterated through list:
new=[x - times[i - 1] i, x in enumerate(times)][1:]
and resulting list this:
dtype: timedelta64[ns], 0 00:00:50 dtype: timedelta64[ns], 0 00:00:10 dtype: timedelta64[ns], 0 00:00:51 dtype: timedelta64[ns], 0 00:00:09 dtype: timedelta64[ns], 0 00:00:50 dtype: timedelta64[ns], 0 00:00:11
any further attempt calculate frequency results in 'typeerror: 'series' objects mutable, cannot hashed' error. , not sure find how calculate number of events per minute or other time unit.
obviously, don't have lot of experience datetime in python, pointers appreciated.
use resample , sum number of events per time period - examples below
i gather want intervals individuals (name1: 1st 2nd event interval; , his/her 2nd 3rd event interval). need group name , difference times each group. in dataset, name1 has more 1 event, , 2 events necessary person-centric interval.
quick , dirty ...
# --- data dataframe can play ... # first, put data in multi-line string (i read file # if had in file - purposes string do). data = """ time name 20150426010203 name1 20150426010303 name2 20150426010307 name3 20150426010409 name1 20150426010503 name4 20150426010510 name1""" # second use stringio , pandas.read_csv pretend # reading file. stringio import stringio # import io in python 3 df = pd.read_csv(stringio(data), header=0, index_col=0, sep='\s+') # third, because pandas did not recognise date-time format # of column made index, force string # converted pandas timestamp come datetimeindex. df.index = pd.to_datetime(df.index, format='%y%m%d%h%m%s') # number of events per minute df['event'] = 1 # sum events per time-period dfepm = df.resample('1min', how=sum) # number of events per hour dfeph = df.resample('1h', how=sum) # time differences name del df['event'] # don't need anymore df['time'] = df.index df['time_diff_by_name'] = df.groupby('name')['time'].diff()
Comments
Post a Comment