r - How to merge two data.table by different column names? -
i have 2 data.table x , y.
columns in x: area, id, value
columns in y: id, price, sales
create 2 data.tables:
x = data.table(area=c('us', 'uk', 'eu'), id=c('c001', 'c002', 'c003'), value=c(100, 200, 300) ) y = data.table(id=c('c001', 'c002', 'c003'), price=c(500, 200, 400), sales=c(20, 30, 15) )
and set keys x , y:
setkey(x, id) setkey(y, id)
now try join x , y id
in x , id
in y:
merge(x, y) merge(x, y, by=c('id', 'id')) merge(x, y, by.x='id', by.y='id')
all raised error saying column names in by
argument invalid.
i referred manual of data.table , found merge
function not supporting by.x
, by.y
arguments.
how join 2 data.tables different column names without changing column names?
append:
managed join 2 tables x[y]
, why merge
function fails in data.table?
use operation:
x[y] # area id value price sales # 1: c001 100 500 20 # 2: uk c002 200 200 30 # 3: eu c003 300 400 15
or operation:
y[x] # id price sales area value # 1: c001 500 20 100 # 2: c002 200 30 uk 200 # 3: c003 400 15 eu 300
edit after edited question, read section 1.12 of faq: "what didifference between x[y] , merge(x,y)?", led me checkout ?merge
, discovered there 2 different merge functions depending upon package using. default merge.data.frame
data.table uses merge.data.table
. compare
merge(x, y, by.x = "id", by.y = "id") # merge.data.table # error in merge.data.table(x, y, by.x = "id", by.y = "id") : # non-empty vector of column names `by` required.
with
merge.data.frame(x, y, by.x = "id", by.y = "id") # id area value price sales # 1 c001 100 500 20 # 2 c002 uk 200 200 30 # 3 c003 eu 300 400 15
edit completeness based upon comment @michael bernsteiner, looks data.table
team planning on implementing by.x
, by.y
merge.data.table
function, hasn't done yet.
Comments
Post a Comment