"flatten" data based on match from two data frames in R -


i have 2 data frames follows

the first survey table tells when person's survey taken

id = c('1000021','1000021') surveydate = c('2014-05-30','2013-05-01') dfsurvey = data.frame(id,surveydate) > dfsurvey               id  surveydate 1        1000021  2014-05-30 2        1000021  2013-05-01 

the second hobbies table tells person's hobbies recorded on day. on different days, hobbies different.

id = c('1000021','1000021','1000021','1000021','1000021','1000021','1000021') hobbyname = c('running','volleyball','pingpong','badminton','swimming','running','pingpong') surveydate = c('2014-05-30','2014-05-30','2014-05-30','2014-05-30','2014-05-30','2013-05-01','2013-05-01') dfhobby = data.frame(id,hobbyname,surveydate) > dfhobby    id                                      hobbyname  surveydate 1        1000021                             running  2014-05-30 2        1000021                          volleyball  2014-05-30 3        1000021                            pingpong  2014-05-30 4        1000021                           badminton  2014-05-30 5        1000021                            swimming  2014-05-30 6        1000021                             running  2013-05-01 7        1000021                            pingpong  2013-05-01 

to survey table has 2 rows, add expanded list of hobbies, each hobby getting it's own column, call "flattening". this,

#expected final output - add columns dfsurvey > dfsurvey id     surveydate                        hobby_running     hobby_volleyball     hobby_pingpong    hobby_badminton hobby_swimming 1        1000021                                 1                    1                  1                  1              1  2        1000021                                 1                    0                  1                  0              0 

this code first construct column names, , use nested loop mark 1 against hobby. however, very slow, around 1 second 1 iteration of nested loop

#making columns , setting them 0 default hobbyvalues = unique(dfhobby$hobbyname) for(i in 1:length(hobbyvalues)) {     print(i)     dfsurvey[paste("hobby_",hobbyvalues[i],sep="")] = 0 }  #flattening iterative for(i in 1:nrow(dfsurvey)) {     print(i)      listofhobbies = dfhobby[which(dfhobby$id == dfsurvey[i,"id"] & dfhobby$surveydate == dfsurvey[i,"surveydate"]),"hobbyname"]      if(length(listofhobbies) > 0)     {         for(l in 1:length(listofhobbies))         {             dfsurvey[i,paste("hobby_",listofhobbies[l],sep="")] = 1         }     } } 

i have tried foreach package , domc package , able write code in parallel. however, slow well.

is there better way or library in r can me this? thanks.

> library(reshape2) > dcast(dfhobby,id*surveydate~hobbyname,fill=0,length)         id surveydate badminton pingpong running swimming volleyball 1 1000021 2013-05-01         0        1       1        0          0 2 1000021 2014-05-30         1        1       1        1          1   > dcast(dfhobby,surveydate~hobbyname,fill=0,length)    surveydate badminton pingpong running swimming volleyball 1 2013-05-01         0        1       1        0          0 2 2014-05-30         1        1       1        1          1 

Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -