hadoop - Java MapReduce - How to Output Top 10 from an IntWritable Sum in Reducer Class -


i'm having difficulty writing reducer code top 10 (key,value) pair output.

my current output formatted ((year, market), total amount). i'm looking top 10 total amounts each year. current code outputting every amount every market each year.

any recommendations appreciated!

mapper:

public class fundingmapper extends mapper<longwritable, text, text, intwritable> {  private text year = new text(); private text market = new text();  public void map(longwritable key, text value, context context) throws ioexception, interruptedexception {      string line = value.tostring();     csvreader reader = new csvreader(new stringreader(line));      string[] array = reader.readnext();     reader.close();      year.set(array[14]);     market.set(array[3]);      string amountstring = array[15].replaceall("[^0-9]","");     int amount = 0;      try {         amount = integer.parseint(amountstring);     }      catch(numberformatexception nfe) {         return;     }      intwritable intw = new intwritable(amount);      string s = new stringbuilder().append(year + " ").append(market + " ").tostring();      context.write(new text(s), intw); } } 

reducer:

public class fundingreducer extends reducer<text, intwritable, text, intwritable> {  public void reduce(text key, iterable<intwritable> values, context context) throws ioexception,          interruptedexception {      int sum = 0;      for(intwritable value : values) {         sum += value.get();     }      context.write(key, new intwritable(sum)); } } 

data sample:

/organization/contravir-pharmaceuticals contravir pharmaceuticals   |biotechnology| biotechnology   usa ny  new york city   new york    /funding-round/9a7cc724deba554585e2b79c14605866 post_ipo_equity     8/22/14 2014-08      2014-q3    2014    4,742,648  /organization/contravir-pharmaceuticals contravir pharmaceuticals   |biotechnology| biotechnology   usa ny  new york city   new york    /funding-round/04a7ec54417a0f9a6c99cf8db2eac819 venture   10/15/14    2014-10  2014-q4    2014    9,000,000      /organization/contravir-pharmaceuticals contravir pharmaceuticals   |biotechnology| biotechnology   usa ny  new york city   new york    /funding-round/328384053df3a992ca6d5da55ca0420e venture     2/14/14 2014-02  2014-q1    2014    3,225,000      /organization/contrib-com   contrib.com |entrepreneur|technology|domains|education|social media|    social media    usa fl  palm beaches    delray beach    /funding-round/fea112ed22657c1456820aa26af3ab17 seed        6/17/14 2014-06  2014-q2    2014    300,000     

output sample:

2014  biotechnology  16967648 2014  social media  300000 

you need have key year in map output. ensure values each year @ time in reducer. , later can filter out 10 values output. check below.

 public void map(longwritable key, text value, context context) throws ioexception, interruptedexception {          string line = value.tostring();         csvreader reader = new csvreader(new stringreader(line));          string[] array = reader.readnext();         reader.close();          year.set(array[14]);         market.set(array[3]);          string amountstring = array[15].replaceall("[^0-9]","");         int amount = 0;          try {             amount = integer.parseint(amountstring);         }          catch(numberformatexception nfe) {             return;         }          intwritable intw = new intwritable(amount);          context.write(new intwritable(year), new text(amount +" "+ market));     }      public void reduce(text key, iterable<intwritable> values, context context) throws ioexception,              interruptedexception {          int count= 0;         int amount =0;         string market = "";         for(intwritable value : values) {            market = value.tostring().split(" ")[1];            amount = integer.parseint(value.tostring.split(" ")[0])             if(count < 10){               count ++;               context.write(key, value);           } else  break;         }         // context.write(key, new intwritable(sum));     } 

Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -