scala - how to format the print output in spark -
i rdd[(string, array[string])]
, 2 functions,which meant find movie names given movie id.
def find_name1( n : string, m: map[string,string]) = { print(n+":") println(m.get(n).mkstring) } def find_name2(n:array[string], m: map[string,string]) = { print("here recommend:") for(i<-0 until n.length) { print(n(i)+":"+m.get(n(i)).mkstring+",") } println() }
then print work in way
data.foreach{x=>find_name1(x._1,m.tomap) find_name2(x._2,m.tomap)}
result should 4 park, first looks wired. is:
1375:1694:apostle, (1997) star trek iii: search spock (1984) here recommend:here recommend:1373:star trek v: final frontier (1989),1372:star trek vi: undiscovered country (1991),994:big night (1996),329:star trek: generations (1994),1810:primary colors (1998),2641:superman ii (1980),1120:people vs. larry flynt, (1996),1371:star trek: motion picture (1979),281:nobody's fool (1994),1635:ice storm, (1997),
why first 2 id not printed format: id:name. looks print work not done line line in rdd:data. there 1 can tell me why happen? how can format output want.
a way such report rdd transform data until have items required , then, in final transformation, produce actual report.
also, side-effecting functions discouraged produce non-deterministic results in distributed environment (in case of println, if using cluster, wouldn't see them. works in local mode, therefore it's not scalable)
i'd suggest following changes (*):
def findmovietitle(id: string) : string = ??? val recommendationreportrdd = moviedatardd.map{case (id, recommendations) => val formatid: string => string = id => s"$id: ${findmovietitle(id)}" val recomstr = recommendations.map(id => formatid(id)).mkstring(",")) s"${formatid(id)}. recommend: $recomstr" }
now, can flexibly decide report.
you can print up:
recommendationreportrdd.collect.foreach(println _)
but also, save file:
recommendationreportrdd.saveasttextfile("path/to/report.txt")
(*) code provided illustration purposes. not compiled or tested.
Comments
Post a Comment