matrix - Get words and values Between Parentheses in Scala-Spark -


here data :

doc1: (does,1) (just,-1) (what,0) (was,1) (needed,1) (to,0) (charge,1) (the,0) (macbook,1) doc2: (pro,1) (g4,-1) (13inch,0) (laptop,1) doc3: (only,1) (beef,0) (was,1) (it,0) (no,-1) (longer,0) (lights,-1) (up,0) (the,-1) etc... 

and want extract words , values , store them in 2 separated matrices , matrix_1 (docid words) , matrix_2 (docid values) ;

input.txt ========= doc1: (does,1) (just,-1) (what,0) (was,1) (needed,1) (to,0) (charge,1) (the,0) (macbook,1) doc2: (pro,1) (g4,-1) (13inch,0) (laptop,1) doc3: (only,1) (beef,0) (was,1) (it,0) (no,-1) (longer,0) (lights,-1) (up,0) (the,-1) 
val inputtext = sc.textfile("input.txt") var digested = input.map(line => line.split(":"))         .map(row => row(0) -> row(1).trim.split(" "))         .map(row => row._1 -> row._2.map(_.stripprefix("(").stripsuffix(")").trim.split(",")))  var matrix_1 = digested.map(row => row._1 -> row._2.map( => a(0))) var matrix_2 = digested.map(row => row._1 -> row._2.map( => a(1))) 

gives:

list(   (doc1 -> does,just,what,was,needed,to,charge,the,macbook),   (doc2 -> pro,g4,13inch,laptop),   (doc3 -> only,beef,was,it,no,longer,lights,up,the) )  list(   (doc1 -> 1,-1,0,1,1,0,1,0,1),    (doc2 -> 1,-1,0,1),    (doc3 -> 1,0,1,0,-1,0,-1,0,-1) ) 

Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -