"flatten" data based on match from two data frames in R -
i have 2 data frames follows
the first survey table tells when person's survey taken
id = c('1000021','1000021') surveydate = c('2014-05-30','2013-05-01') dfsurvey = data.frame(id,surveydate) > dfsurvey id surveydate 1 1000021 2014-05-30 2 1000021 2013-05-01
the second hobbies table tells person's hobbies recorded on day. on different days, hobbies different.
id = c('1000021','1000021','1000021','1000021','1000021','1000021','1000021') hobbyname = c('running','volleyball','pingpong','badminton','swimming','running','pingpong') surveydate = c('2014-05-30','2014-05-30','2014-05-30','2014-05-30','2014-05-30','2013-05-01','2013-05-01') dfhobby = data.frame(id,hobbyname,surveydate) > dfhobby id hobbyname surveydate 1 1000021 running 2014-05-30 2 1000021 volleyball 2014-05-30 3 1000021 pingpong 2014-05-30 4 1000021 badminton 2014-05-30 5 1000021 swimming 2014-05-30 6 1000021 running 2013-05-01 7 1000021 pingpong 2013-05-01
to survey table has 2 rows, add expanded list of hobbies, each hobby getting it's own column, call "flattening". this,
#expected final output - add columns dfsurvey > dfsurvey id surveydate hobby_running hobby_volleyball hobby_pingpong hobby_badminton hobby_swimming 1 1000021 1 1 1 1 1 2 1000021 1 0 1 0 0
this code first construct column names, , use nested loop mark 1 against hobby. however, very slow, around 1 second 1 iteration of nested loop
#making columns , setting them 0 default hobbyvalues = unique(dfhobby$hobbyname) for(i in 1:length(hobbyvalues)) { print(i) dfsurvey[paste("hobby_",hobbyvalues[i],sep="")] = 0 } #flattening iterative for(i in 1:nrow(dfsurvey)) { print(i) listofhobbies = dfhobby[which(dfhobby$id == dfsurvey[i,"id"] & dfhobby$surveydate == dfsurvey[i,"surveydate"]),"hobbyname"] if(length(listofhobbies) > 0) { for(l in 1:length(listofhobbies)) { dfsurvey[i,paste("hobby_",listofhobbies[l],sep="")] = 1 } } }
i have tried foreach package , domc package , able write code in parallel. however, slow well.
is there better way or library in r can me this? thanks.
> library(reshape2) > dcast(dfhobby,id*surveydate~hobbyname,fill=0,length) id surveydate badminton pingpong running swimming volleyball 1 1000021 2013-05-01 0 1 1 0 0 2 1000021 2014-05-30 1 1 1 1 1 > dcast(dfhobby,surveydate~hobbyname,fill=0,length) surveydate badminton pingpong running swimming volleyball 1 2013-05-01 0 1 1 0 0 2 2014-05-30 1 1 1 1 1
Comments
Post a Comment