data set is too big (which is too big to be held in one machine's mem), and I should break it to small daily set #5

jackyhawk · 2022-05-12T10:02:33Z

Thanks for the excellent code.

and I met one question, my data set is too big (which can not be held in one machine's mem), and I should break it to small daily set.
so I should first generate each day's walk result (sequence) and then train by other code(suan as Gensim) as word2vec.

All I want is the random walking result

as for the walking result, should I just return before the part listed as following?
and then save dw_rw to disk for latter training?

xgfs · 2022-05-12T13:45:00Z

You will need to deal with multiprocessing slightly better than I do in the training loop. One option would be to just run the random walk generation and write to the file in the single thread. As for the place, it is correct.

jackyhawk · 2022-05-12T15:40:08Z

Thanks very much.
Is there any other repo that is available to generate random walk sequence for big data set?
I found when I use data set bigger than 10 million edge, the memory required would be bigger than my memory capacity(200G)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data set is too big (which is too big to be held in one machine's mem), and I should break it to small daily set #5

data set is too big (which is too big to be held in one machine's mem), and I should break it to small daily set #5

jackyhawk commented May 12, 2022

xgfs commented May 12, 2022

jackyhawk commented May 12, 2022

data set is too big (which is too big to be held in one machine's mem), and I should break it to small daily set #5

data set is too big (which is too big to be held in one machine's mem), and I should break it to small daily set #5

Comments

jackyhawk commented May 12, 2022

xgfs commented May 12, 2022

jackyhawk commented May 12, 2022