hadoop - Java MapReduce - How to Output Top 10 from an IntWritable Sum in Reducer Class -
i'm having difficulty writing reducer code top 10 (key,value) pair output.
my current output formatted ((year, market), total amount). i'm looking top 10 total amounts each year. current code outputting every amount every market each year.
any recommendations appreciated!
mapper:
public class fundingmapper extends mapper<longwritable, text, text, intwritable> { private text year = new text(); private text market = new text(); public void map(longwritable key, text value, context context) throws ioexception, interruptedexception { string line = value.tostring(); csvreader reader = new csvreader(new stringreader(line)); string[] array = reader.readnext(); reader.close(); year.set(array[14]); market.set(array[3]); string amountstring = array[15].replaceall("[^0-9]",""); int amount = 0; try { amount = integer.parseint(amountstring); } catch(numberformatexception nfe) { return; } intwritable intw = new intwritable(amount); string s = new stringbuilder().append(year + " ").append(market + " ").tostring(); context.write(new text(s), intw); } }
reducer:
public class fundingreducer extends reducer<text, intwritable, text, intwritable> { public void reduce(text key, iterable<intwritable> values, context context) throws ioexception, interruptedexception { int sum = 0; for(intwritable value : values) { sum += value.get(); } context.write(key, new intwritable(sum)); } }
data sample:
/organization/contravir-pharmaceuticals contravir pharmaceuticals |biotechnology| biotechnology usa ny new york city new york /funding-round/9a7cc724deba554585e2b79c14605866 post_ipo_equity 8/22/14 2014-08 2014-q3 2014 4,742,648 /organization/contravir-pharmaceuticals contravir pharmaceuticals |biotechnology| biotechnology usa ny new york city new york /funding-round/04a7ec54417a0f9a6c99cf8db2eac819 venture 10/15/14 2014-10 2014-q4 2014 9,000,000 /organization/contravir-pharmaceuticals contravir pharmaceuticals |biotechnology| biotechnology usa ny new york city new york /funding-round/328384053df3a992ca6d5da55ca0420e venture 2/14/14 2014-02 2014-q1 2014 3,225,000 /organization/contrib-com contrib.com |entrepreneur|technology|domains|education|social media| social media usa fl palm beaches delray beach /funding-round/fea112ed22657c1456820aa26af3ab17 seed 6/17/14 2014-06 2014-q2 2014 300,000
output sample:
2014 biotechnology 16967648 2014 social media 300000
you need have key year in map output. ensure values each year @ time in reducer. , later can filter out 10 values output. check below.
public void map(longwritable key, text value, context context) throws ioexception, interruptedexception { string line = value.tostring(); csvreader reader = new csvreader(new stringreader(line)); string[] array = reader.readnext(); reader.close(); year.set(array[14]); market.set(array[3]); string amountstring = array[15].replaceall("[^0-9]",""); int amount = 0; try { amount = integer.parseint(amountstring); } catch(numberformatexception nfe) { return; } intwritable intw = new intwritable(amount); context.write(new intwritable(year), new text(amount +" "+ market)); } public void reduce(text key, iterable<intwritable> values, context context) throws ioexception, interruptedexception { int count= 0; int amount =0; string market = ""; for(intwritable value : values) { market = value.tostring().split(" ")[1]; amount = integer.parseint(value.tostring.split(" ")[0]) if(count < 10){ count ++; context.write(key, value); } else break; } // context.write(key, new intwritable(sum)); }
Comments
Post a Comment