pandas - reshaping csv by eliminating redundant row keys and merging fields using python -

August 15, 2010

i have csv file in form of:

'userid','metric name (1-10)','value'

the column 'metric name' has upwards of 10 different metrics same userid have multiple rows associated it. accomplish this:

'userid1', 'metric name 1'='value1', 'metric name 2'='value2', 'metric name 3'='value3'... 'metric name 10' = 'value10'

a single row each userid metrics , values associated user in k/v pairs

i started playing around pivot function doesn't need to...

import pandas pd data=pd.read_csv('bps.csv') data.pivot('entityname', 'metricname', 'value').stack()

i thinking need iterator through dataset user , grab metrics associated user , build metric k/v pairs during each iteration before going on new user. did pretty thorough job of searching internet didn't find looking for. please let me know if there simple library use.

here come solution using standard python, not framework.

starting following data file :

id1,name,foo id1,age,10 id2,name,bar id2,class,example id1,aim,demonstrate

you can execute following code :

separator = "," useridkey = "userid" defaultvalue = "no data"  data = {} #collect data open("data.csv", 'r') datafile:     line in datafile:         #remove end of line character         line = line.replace("\n", "")          userid, fieldname, value = line.split(separator)          if not userid in data.keys():             data[userid] = {useridkey:userid}          data[userid][fieldname] = value  #retrieve columns header in use columnsheaders = set() key in data:     dataset = data[key]     datasetkey in dataset :         columnsheaders.add(datasetkey)  columnsheaders.remove(useridkey) columnsheaders = list(columnsheaders) columnsheaders.sort()  def getvalue(key, dic):     if key in dic.keys():         return dic[key]     else:         return defaultvalue  #then export result open("output.csv", 'w') outputfile:     #export first line of header     outputfile.write(useridkey)     header in columnsheaders:         outputfile.write(", {0}".format(header))     outputfile.write("\n")     #and export each line     key in data:         dataset = data[key]         outputfile.write(dataset[useridkey])         header in columnsheaders:             outputfile.write(", {0}".format(getvalue(header, dataset)))         outputfile.write("\n")

and following result :

userid, age, aim, class, name id1, 'age'='10', 'aim'='demonstrate', 'class'='no data', 'name'='foo' id2, 'age'='no data', 'aim'='no data', 'class'='example', 'name'='bar'

i think code can modified match objectives if required.

hope helps. arthur.

Search This Blog

UV code

pandas - reshaping csv by eliminating redundant row keys and merging fields using python -

Comments

Post a Comment

Popular posts from this blog

jquery - How do you format the date used in the popover widget title of FullCalendar? -

Bubble Sort Manually a Linked List in Java -

asp.net mvc - SSO between MVCForum and Umbraco7 -