Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any difference between node2vec-c and reference implementation of Node2Vec? #4

Open
dkorkinof opened this issue Jan 14, 2022 · 3 comments

Comments

@dkorkinof
Copy link

I was wondering if there is (algorithmically) any difference between your C++ implementation of Node2Vec and the original reference implementation from Stanford. For instance, I saw something about subsampling frequent nodes in the C++ code.
I'm asking because I tried a few other Node2Vec implementations, including the one in pytorch_geometric, and have had trouble replicating the good performance of your C++ code. So I was wondering if there is anything missing in those.

@xgfs
Copy link
Owner

xgfs commented Jan 15, 2022

Can you link the original implementation you are mentioning explicitly? I have reimplemented the official Python code that used the Gensim library underneath. The subsampling actually comes from word2vec implementation in Gensim (for that matter, it's standard word2vec stuff). I have provided some subsampling analysis in Section 3.6 of the VERSE paper. I believe the subsampling is rather key, as it effectively shifts the positive pair distribution.

@dkorkinof
Copy link
Author

ok, that helps a lot! I didn't look into the Gensim implementation at all.
The two implementations I tried were this, based on Gensim, so it should be fine and the pytorch_geometric here which I'm pretty sure doesn't do any subsampling (unless that is done while sampling the walks).

@xgfs
Copy link
Owner

xgfs commented Jan 22, 2022

Yeah, I tried to keep it close to Python implementation (to be honest, there is not much in Python code per se, I was just reimplementing Gensim). I believe this is still one of the fastest implementations available, since I generate the walks on-the-fly and store the precomputed walk probabilities efficiently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants