something wrong with random walk in node2vec_spark? #41

rainbow2301 · 2018-05-25T03:31:12Z

val edge2attr = graph.triplets.map { edgeTriplet =>
(s"${edgeTriplet.srcId}${edgeTriplet.dstId}", edgeTriplet.attr)
}.repartition(200).cache

(s"${prevNodeId}${currentNodeId}", (srcNodeId, pathBuffer))
}.join(edge2attr).map { case (edge, ((srcNodeId, pathBuffer), attr)) =>

in the code, join key is generated by s"${edgeTriplet.srcId}${edgeTriplet.dstId}",
do we need a separator between the two elements?

wl142857 · 2019-03-05T10:22:23Z

Actually Yes. You should use s"${edgeTriplet.srcId}\t${edgeTriplet.dstId}" instead!

liliangjie91 · 2019-12-16T07:21:43Z

yes.
if you dont add a separator, edge between node #1 and node #1111 will be same with edge between node #11 and node #111 which is '11111'. When using sepaeator,like \t,there will be 1\t1111 vs 11\t111.

And ,i think, thats why u got bad results when your data is very big. Becase the bigger data you use,the more chance you get Wrong edges

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

something wrong with random walk in node2vec_spark? #41

something wrong with random walk in node2vec_spark? #41

rainbow2301 commented May 25, 2018

wl142857 commented Mar 5, 2019

liliangjie91 commented Dec 16, 2019 •

edited

Loading

something wrong with random walk in node2vec_spark? #41

something wrong with random walk in node2vec_spark? #41

Comments

rainbow2301 commented May 25, 2018

wl142857 commented Mar 5, 2019

liliangjie91 commented Dec 16, 2019 • edited Loading

liliangjie91 commented Dec 16, 2019 •

edited

Loading