hadoop - Filter CSV columns using pig -
hi stackoverflow community;
i'm new pig, have csv file contains 5 columns headers it's like:
column1 | column2 | column3 | column4 | column5
test1012 | test2045 | test3250 | test4865 | test5110
test1245 | test2047 | test3456 | test4234 | test5221 ..........
i want sort column 1,3 , 4, dont know how filter column header.
if please point me right functions accomplish want do, great. thanks!
let's assume loaded below (assuming using comma delimiter) can use order functionality.
myinput = load 'myfile.csv' using pigstorage(',') (c1:chararray,c2:chararry,c3:chararray,c4:chararray,c5:chararry); mysortedinput = order myinput c1 asc, c3, c4 asc; dump mysortedinput;
if wanted filter out columns after load following.
myinputwithlesscols = foreach myinput generate c1, c3, c4;
if totally misunderstood , wanting filter out header row following after load statement.
myinputwithoutheaders = filter myinput c1 != 'column1' , c2 != 'column2' , c3 != 'column3' , c4 != 'column4' , c5 != 'column5';
Comments
Post a Comment