matrix - Get words and values Between Parentheses in Scala-Spark -
here data :
doc1: (does,1) (just,-1) (what,0) (was,1) (needed,1) (to,0) (charge,1) (the,0) (macbook,1) doc2: (pro,1) (g4,-1) (13inch,0) (laptop,1) doc3: (only,1) (beef,0) (was,1) (it,0) (no,-1) (longer,0) (lights,-1) (up,0) (the,-1) etc...
and want extract words , values , store them in 2 separated matrices , matrix_1 (docid words) , matrix_2 (docid values) ;
input.txt ========= doc1: (does,1) (just,-1) (what,0) (was,1) (needed,1) (to,0) (charge,1) (the,0) (macbook,1) doc2: (pro,1) (g4,-1) (13inch,0) (laptop,1) doc3: (only,1) (beef,0) (was,1) (it,0) (no,-1) (longer,0) (lights,-1) (up,0) (the,-1)
val inputtext = sc.textfile("input.txt") var digested = input.map(line => line.split(":")) .map(row => row(0) -> row(1).trim.split(" ")) .map(row => row._1 -> row._2.map(_.stripprefix("(").stripsuffix(")").trim.split(","))) var matrix_1 = digested.map(row => row._1 -> row._2.map( => a(0))) var matrix_2 = digested.map(row => row._1 -> row._2.map( => a(1)))
gives:
list( (doc1 -> does,just,what,was,needed,to,charge,the,macbook), (doc2 -> pro,g4,13inch,laptop), (doc3 -> only,beef,was,it,no,longer,lights,up,the) ) list( (doc1 -> 1,-1,0,1,1,0,1,0,1), (doc2 -> 1,-1,0,1), (doc3 -> 1,0,1,0,-1,0,-1,0,-1) )
Comments
Post a Comment