-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(mongo): when creating a df from a cursor, allow to do it by chunks (saves memory) [TCTC-9496] #1813
Conversation
if not chunk: | ||
break | ||
chunks.append(pd.DataFrame.from_records(chunk)) | ||
return pd.concat(chunks) if chunks else pd.DataFrame() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Est-ce que ca fait pas de différence de créer n petit df et des concaténer après, plutot que de faire un dataframe qu'on mute en ajoutant les chunks à chaque fois ?
(genre df = pd.DataFrame() au début puis df = df.concat(pd.DataFrame.from_records(chunk)) à chaque tour)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
je pense que ça revient au même oui, pourquoi ?
if chunk_size: | ||
chunks = [] | ||
while True: | ||
chunk = list(itertools.islice(data, chunk_size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 nits mais LGTM sinon 👍
Co-authored-by: Luka Peschke <[email protected]>
No description provided.