r - "fuzzy key matching" for a data.table merge -


i'm trying match workers year year using name strings , measure of experience. experience can increase @ 1 year year, i'd use matching when other metrics fail.

for example:

dt1<-data.table(name=c("jane doe","jane doe",                        "john doe","jane smith"),                 exp=c(0.,5,1,2),id=1:4,key="name") dt2<-data.table(name=c("jane doe","jane doe",                        "john doe","jane smith"),                 exp=c(0,30,1.5,2),key="name") 

i want match first "jane doe" in dt1 first "jane doe" in dt2. latter "jane doe"s don't match, because they're different people (based on vastly different experience levels).

i'd add flags know matched these people in way later on down line. here's first pass:

dt2[dt1,`:=`(id=ifelse(exp<=i.exp+1,i.id,na),              flag=ifelse(exp<=i.exp+1,i.id,na))] 

but not working--here's gives me:

> dt2          name  exp id flag 1:   jane doe  0.0  2    2 2:   jane doe 30.0 na   na 3: jane smith  2.0  4    4 4:   john doe  1.5  3    3 

it seems have missed matching latter "jane doe", appears have matched first "jane doe" wrong prior "jane doe". i'm not quite sure why is; anyway, seems preferable have way incorporate matching on exp before instead of after joining--this clean ifelse mess in defining new variables. suggestions?


for clarity, here's desired output:

> dt2          name  exp id flag 1:   jane doe  1.0  1    1 2:   jane doe 30.0 na   na 3: jane smith  2.0  4    1 4:   john doe  1.5  3    1 

in case join isn't "fuzzy". trying join name exp while allowing 1 one year distance per match. use rolling join -1l specification.

first correctly key data sets

setkey(dt1, name, exp)  setkey(dt2, name, exp)  

then, perform rolling join while passing -1l value

dt2[dt1, `:=`(id = i.id, flag = 1l), roll = -1l] df2 #          name  exp id flag # 1:   jane doe  0.0  1    1 # 2:   jane doe 30.0 na   na # 3: jane smith  2.0  4    1 # 4:   john doe  1.5  3    1 

in future, if you''ll need conduct interval join such c(1l, -1l) can take here examples of foverlaps function.


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -