spark-How can I retrieve item-pair after calculating similarity using RowMatrix -


i have encountered "all-pairs similarity" problem in recommendation system. this databricks blog, seems rowmatrix may come help.

however, rowmatrix matrix type without meaningful row indices, thereby don't know how retrieve similarity result after invoking columnsimilarities(threshold) specific item , j

below details doing:

1) data file comes movielens format this:

user::item::rating 

2) build rowmatrix in each sparse vector i represents ratings of users item i

val datapath = ... val ratings: rdd[rating] = sc.textfile(datapath).map(_.split("::") match {    case array(user, item, rate) => rating(user.toint, item.toint, rate.todouble) }) val rows = ratings.map(rating=>(rating.product, (rating.user, rating.rating)))   .groupbykey()   .map(p => vectors.sparse(useramount, p._2.map(r=>(r._1-1, r._2)).toseq))  val mat = new rowmatrix(rows)  val similarities = mat.columnsimilarities(0.5) 

now coordinatematrix similarities. how can similarity of specific item , j? although can used retrieve rdd[matrixentry], not sure whether row , column j correspond item , j.

i have encountered same problem , solved follows.

  1. you should note columnsimilarities() call similarity of column vectors. however, our "rows" composed of row vectors. should transpose of "rows", let's suppose "tran_rows". calculate tran_rows.columnsimilarities()

  2. thing easy then. in result of columnsimilarities(), the index , j correspond item , item j.


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -