scala - standalone spark: worker didn't show up -
i have 2 question want know:
this code:
object hi { def main (args: array[string]) { println("sucess") val conf = new sparkconf().setappname("hi").setmaster("local") val sc = new sparkcontext(conf) val textfile = sc.textfile("src/main/scala/source.txt") val rows = textfile.map { line => val fields = line.split("::") (fields(0), fields(1).toint) } val x = rows.map{case (range , ratednum) => range}.collect.mkstring("::") val y = rows.map{case (range , ratednum) => ratednum}.collect.mkstring("::") println(x) println(y) println("sucess2") } }
here of resault :
15/04/26 16:49:57 info utils: started service 'sparkui' on port 4040. 15/04/26 16:49:57 info sparkui: started sparkui @ http://192.168.1.105:4040 15/04/26 16:49:57 info executor: starting executor id <driver> on host localhost 15/04/26 16:49:57 info akkautils: connecting heartbeatreceiver: akka.tcp://sparkdriver@192.168.1.105:64952/user/heartbeatreceiver 15/04/26 16:49:57 info nettyblocktransferservice: server created on 64954 15/04/26 16:49:57 info blockmanagermaster: trying register blockmanager 15/04/26 16:49:57 info blockmanagermasteractor: registering block manager localhost:64954 983.1 mb ram, blockmanagerid(<driver>, localhost, 64954) ..... 15/04/26 16:49:59 info sparkcontext: created broadcast 2 broadcast @ dagscheduler.scala:839 15/04/26 16:49:59 info dagscheduler: submitting 1 missing tasks stage 1 (mappartitionsrdd[4] @ map @ hi.scala:25) 15/04/26 16:49:59 info taskschedulerimpl: adding task set 1.0 1 tasks 15/04/26 16:49:59 info tasksetmanager: starting task 0.0 in stage 1.0 (tid 1, localhost, process_local, 1331 bytes) 15/04/26 16:49:59 info executor: running task 0.0 in stage 1.0 (tid 1) 15/04/26 16:49:59 info hadooprdd: input split: file:/users/winsome/ideaprojects/untitled/src/main/scala/source.txt:0+23 15/04/26 16:49:59 info executor: finished task 0.0 in stage 1.0 (tid 1). 1787 bytes result sent driver 15/04/26 16:49:59 info tasksetmanager: finished task 0.0 in stage 1.0 (tid 1) in 13 ms on localhost (1/1) 15/04/26 16:49:59 info dagscheduler: stage 1 (collect @ hi.scala:25) finished in 0.013 s 15/04/26 16:49:59 info taskschedulerimpl: removed taskset 1.0, tasks have completed, pool 15/04/26 16:49:59 info dagscheduler: job 1 finished: collect @ hi.scala:25, took 0.027784 s 1~1::2~2::3~3 10::20::30 sucess2
my first question : when check http://localhost:8080/
there no worker. , can't open http://192.168.1.105:4040
is because use spark standalone?
how fixed this??
(my environment mac,ide intellij)
my 2nd question is:
val x = rows.map{case (range , ratednum) => range}.collect.mkstring("::") val y = rows.map{case (range , ratednum) => ratednum}.collect.mkstring("::") println(x) println(y)
i thiink these code more x , y (something stuff :rows[range]
,rows[ratenum]
),but i'm not familiar scala . give me advice?
i'm not sure first question, reading log see worker node lasted 13 ms, may reason why haven't see it. run longer job , may see workers.
about second question, yes, there simpler way write is:
val x = rows.map{(tuple) => tuple._1}.collect.mkstring("::")
because rdd
made of tuple
scala objects, made of 2 fields can access _1
, _2
respectively.
Comments
Post a Comment