python - A better way to do it in Pandas, I have 2 loops -
new pandas ; there better way ?
import pandas pd import numpy np stringio import stringio devices = stringio("""name;date;cpu;freq;voltage rpi;201501;arm;700mhz;5v galileo;201501;intel;400mhz;3.3v uno;201502;atmel;16mhz;5v """) d = pd.dataframe.from_csv(devices, sep=';', index_col=none) comments = stringio("""comment;t1;t2;t3 cool;arm;; great!;atmel;; great!;intel;5v; fun;atmel;16mhz; fun;700mhz;atmel; """) c = pd.dataframe.from_csv(comments, sep=';', index_col=none) n = d.copy() n['cool'], n['great!'], n['fun'] = 0, 0, 0 i, row in n.iterrows(): j, com in c.iterrows(): if np.all(np.in1d(np.array(com[['t1', 't2', 't3']].dropna()), np.array(row))): n.loc[i, c.loc[j, 'comment']] = 1
at end, build new dataframe n, , looks that:
name date cpu freq voltage cool great! fun 0 rpi 201501 arm 700mhz 5v 1 0 0 1 galileo 201501 intel 400mhz 3.3v 0 0 0 2 uno 201502 atmel 16mhz 5v 0 1 1
the other df, d , c that
name date cpu freq voltage 0 rpi 201501 arm 700mhz 5v 1 galileo 201501 intel 400mhz 3.3v 2 uno 201502 atmel 16mhz 5v comment t1 t2 t3 0 cool arm nan nan 1 great! atmel nan nan 2 great! intel 5v nan 3 fun atmel 16mhz nan 4 fun 700mhz atmel nan
i have use 2 loops it. breaks dream of pandas! better? must missing something..
c['val'] = 1 comments = pd.pivot_table(c,index='t1',columns='comment', values='val',aggfunc=sum).fillna(0) df = pd.merge(d,comments,left_on='cpu',right_index=true,how='left')
comments:
comment cool fun great! t1 700mhz 0 1 0 arm 1 0 0 atmel 0 1 1 intel 0 0 1
df:
name date cpu freq voltage cool fun great! 0 rpi 201501 arm 700mhz 5v 1 0 0 1 galileo 201501 intel 400mhz 3.3v 0 0 1 2 uno 201502 atmel 16mhz 5v 0 1 1
Comments
Post a Comment