deduplicate function is stuck for a long time when applied to large dataset #839
kailashsp
started this conversation in
Show and tell
Replies: 1 comment
-
Hey @kailashsp, thank you for opening this discussion! The In the long term, we want to address this using blocking methods or probabilistic data structures. Before that, could you send a snippet of the code you used and a sample of the data? This might help us debug this. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have been trying out the deduplicate feature. It seems to work fine when I apply it a subset of dataset of size 100 items. But when I apply it to the whole data of 32000. It is stuck and I have tried to change the n_jobs and still no success
Beta Was this translation helpful? Give feedback.
All reactions