apache spark - Mllib missing values handling -


i'm using corr mllib basic interface

val a:rdd[double] = sc.makerdd(seq(1., 1., 0.)) val b:rdd[double] = sc.makerdd(seq(1., -1., 0.)) val r = statistics.corr(a, b) println(r) 

is there possibility have casewise or pairwise removal of nan , infinity values?

by default mllib provides nan result of corr in case of infinity or nan values.

to knowledge, there no built-in function , need filter values out own. 1 approach use java.double (http://docs.oracle.com/javase/7/docs/api/java/lang/double.html) functionality:

import java.lang.double.isnan import java.lang.double.isinfinite  val filtered1 = data1.filter((!isnan(_))&&(!isinfinite(_))) val filtered2 = data2.filter((!isnan(_))&&(!isinfinite(_))) val r = statistics.corr(filtered1, filtered2) println(r) 

Comments

Popular posts from this blog

shopping cart - Page redirect not working PHP -

php - How to modify a menu to show sub-menus -

python - Installing PyDev in eclipse is failed -