scala - how to format the print output in spark -


i rdd[(string, array[string])] , 2 functions,which meant find movie names given movie id.

 def find_name1( n : string, m: map[string,string]) = {     print(n+":")     println(m.get(n).mkstring)   }   def find_name2(n:array[string], m: map[string,string]) = {     print("here recommend:")     for(i<-0 until n.length)     {       print(n(i)+":"+m.get(n(i)).mkstring+",")     }     println()   }   

then print work in way

data.foreach{x=>find_name1(x._1,m.tomap)       find_name2(x._2,m.tomap)}  

result should 4 park, first looks wired. is:

1375:1694:apostle, (1997) star trek iii: search spock (1984) here recommend:here recommend:1373:star trek v: final frontier (1989),1372:star trek vi: undiscovered country (1991),994:big night (1996),329:star trek: generations (1994),1810:primary colors (1998),2641:superman ii (1980),1120:people vs. larry flynt, (1996),1371:star trek: motion picture (1979),281:nobody's fool (1994),1635:ice storm, (1997), 

why first 2 id not printed format: id:name. looks print work not done line line in rdd:data. there 1 can tell me why happen? how can format output want.

a way such report rdd transform data until have items required , then, in final transformation, produce actual report.

also, side-effecting functions discouraged produce non-deterministic results in distributed environment (in case of println, if using cluster, wouldn't see them. works in local mode, therefore it's not scalable)

i'd suggest following changes (*):

def findmovietitle(id: string) : string = ???  val recommendationreportrdd = moviedatardd.map{case (id, recommendations) =>         val formatid: string => string = id => s"$id: ${findmovietitle(id)}"         val recomstr = recommendations.map(id => formatid(id)).mkstring(","))      s"${formatid(id)}. recommend: $recomstr" } 

now, can flexibly decide report.

you can print up:

recommendationreportrdd.collect.foreach(println _) 

but also, save file:

recommendationreportrdd.saveasttextfile("path/to/report.txt") 

(*) code provided illustration purposes. not compiled or tested.


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -