r - "fuzzy key matching" for a data.table merge -

August 15, 2012

i'm trying match workers year year using name strings , measure of experience. experience can increase @ 1 year year, i'd use matching when other metrics fail.

for example:

dt1<-data.table(name=c("jane doe","jane doe",                        "john doe","jane smith"),                 exp=c(0.,5,1,2),id=1:4,key="name") dt2<-data.table(name=c("jane doe","jane doe",                        "john doe","jane smith"),                 exp=c(0,30,1.5,2),key="name")

i want match first "jane doe" in dt1 first "jane doe" in dt2. latter "jane doe"s don't match, because they're different people (based on vastly different experience levels).

i'd add flags know matched these people in way later on down line. here's first pass:

dt2[dt1,`:=`(id=ifelse(exp<=i.exp+1,i.id,na),              flag=ifelse(exp<=i.exp+1,i.id,na))]

but not working--here's gives me:

> dt2          name  exp id flag 1:   jane doe  0.0  2    2 2:   jane doe 30.0 na   na 3: jane smith  2.0  4    4 4:   john doe  1.5  3    3

it seems have missed matching latter "jane doe", appears have matched first "jane doe" wrong prior "jane doe". i'm not quite sure why is; anyway, seems preferable have way incorporate matching on exp before instead of after joining--this clean ifelse mess in defining new variables. suggestions?

for clarity, here's desired output:

> dt2          name  exp id flag 1:   jane doe  1.0  1    1 2:   jane doe 30.0 na   na 3: jane smith  2.0  4    1 4:   john doe  1.5  3    1

in case join isn't "fuzzy". trying join name exp while allowing 1 one year distance per match. use rolling join -1l specification.

first correctly key data sets

setkey(dt1, name, exp)  setkey(dt2, name, exp)

then, perform rolling join while passing -1l value

dt2[dt1, `:=`(id = i.id, flag = 1l), roll = -1l] df2 #          name  exp id flag # 1:   jane doe  0.0  1    1 # 2:   jane doe 30.0 na   na # 3: jane smith  2.0  4    1 # 4:   john doe  1.5  3    1

in future, if you''ll need conduct interval join such c(1l, -1l) can take here examples of foverlaps function.

Search This Blog

UV code

r - "fuzzy key matching" for a data.table merge -

Comments

Post a Comment

Popular posts from this blog

jquery - How do you format the date used in the popover widget title of FullCalendar? -

Bubble Sort Manually a Linked List in Java -

asp.net mvc - SSO between MVCForum and Umbraco7 -