hadoop - Filter CSV columns using pig -


hi stackoverflow community;

i'm new pig, have csv file contains 5 columns headers it's like:

column1 | column2 | column3 | column4 | column5

test1012 | test2045 | test3250 | test4865 | test5110
test1245 | test2047 | test3456 | test4234 | test5221 ..........

i want sort column 1,3 , 4, dont know how filter column header.

if please point me right functions accomplish want do, great. thanks!

let's assume loaded below (assuming using comma delimiter) can use order functionality.

myinput = load 'myfile.csv' using pigstorage(',')      (c1:chararray,c2:chararry,c3:chararray,c4:chararray,c5:chararry); mysortedinput = order myinput c1 asc, c3, c4 asc; dump mysortedinput; 

if wanted filter out columns after load following.

myinputwithlesscols = foreach myinput generate      c1, c3, c4; 

if totally misunderstood , wanting filter out header row following after load statement.

myinputwithoutheaders = filter myinput c1 != 'column1'     , c2 != 'column2' , c3 != 'column3'      , c4 != 'column4' , c5 != 'column5'; 

Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -