Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mongo): when creating a df from a cursor, allow to do it by chunks (saves memory) [TCTC-9496] #1813

Merged
merged 3 commits into from
Nov 13, 2024

Conversation

fspot
Copy link
Member

@fspot fspot commented Nov 8, 2024

No description provided.

@fspot fspot added the enhancement New feature or request label Nov 8, 2024
@fspot fspot self-assigned this Nov 8, 2024
@fspot fspot added the wip label Nov 8, 2024
if not chunk:
break
chunks.append(pd.DataFrame.from_records(chunk))
return pd.concat(chunks) if chunks else pd.DataFrame()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Est-ce que ca fait pas de différence de créer n petit df et des concaténer après, plutot que de faire un dataframe qu'on mute en ajoutant les chunks à chaque fois ?
(genre df = pd.DataFrame() au début puis df = df.concat(pd.DataFrame.from_records(chunk)) à chaque tour)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

je pense que ça revient au même oui, pourquoi ?

if chunk_size:
chunks = []
while True:
chunk = list(itertools.islice(data, chunk_size))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@lukapeschke lukapeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 nits mais LGTM sinon 👍

toucan_connectors/mongo/mongo_connector.py Outdated Show resolved Hide resolved
toucan_connectors/mongo/mongo_connector.py Show resolved Hide resolved
@fspot fspot merged commit 07125af into master Nov 13, 2024
4 checks passed
@fspot fspot deleted the mongo-get-df-by-chunks branch November 13, 2024 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wip
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants