feat(mongo): when creating a df from a cursor, allow to do it by chunks (saves memory) [TCTC-9496] #1813

fspot · 2024-11-08T15:29:41Z

No description provided.

…k (saves memory)

davinov · 2024-11-12T15:35:47Z

toucan_connectors/mongo/mongo_connector.py

+                if not chunk:
+                    break
+                chunks.append(pd.DataFrame.from_records(chunk))
+            return pd.concat(chunks) if chunks else pd.DataFrame()


Est-ce que ca fait pas de différence de créer n petit df et des concaténer après, plutot que de faire un dataframe qu'on mute en ajoutant les chunks à chaque fois ?
(genre df = pd.DataFrame() au début puis df = df.concat(pd.DataFrame.from_records(chunk)) à chaque tour)

je pense que ça revient au même oui, pourquoi ?

davinov · 2024-11-12T15:36:42Z

toucan_connectors/mongo/mongo_connector.py

+        if chunk_size:
+            chunks = []
+            while True:
+                chunk = list(itertools.islice(data, chunk_size))


lukapeschke

2 nits mais LGTM sinon 👍

toucan_connectors/mongo/mongo_connector.py

Co-authored-by: Luka Peschke <[email protected]>

feat(mongo): when creating a df from a cursor, allow to do it by chun…

ced3e03

…k (saves memory)

fspot added the enhancement New feature or request label Nov 8, 2024

fspot self-assigned this Nov 8, 2024

fspot added the wip label Nov 8, 2024

chore: fix test

da450a8

davinov reviewed Nov 12, 2024

View reviewed changes

lukapeschke reviewed Nov 12, 2024

View reviewed changes

toucan_connectors/mongo/mongo_connector.py Outdated Show resolved Hide resolved

toucan_connectors/mongo/mongo_connector.py Show resolved Hide resolved

Update toucan_connectors/mongo/mongo_connector.py

d48d7ac

Co-authored-by: Luka Peschke <[email protected]>

lukapeschke approved these changes Nov 12, 2024

View reviewed changes

fspot merged commit 07125af into master Nov 13, 2024
4 checks passed

fspot deleted the mongo-get-df-by-chunks branch November 13, 2024 10:07

fspot mentioned this pull request Nov 19, 2024

fix(mongo): igore index when concatenating chunks #1825

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mongo): when creating a df from a cursor, allow to do it by chunks (saves memory) [TCTC-9496] #1813

feat(mongo): when creating a df from a cursor, allow to do it by chunks (saves memory) [TCTC-9496] #1813

fspot commented Nov 8, 2024

davinov Nov 12, 2024

fspot Nov 12, 2024

davinov Nov 12, 2024

lukapeschke left a comment

feat(mongo): when creating a df from a cursor, allow to do it by chunks (saves memory) [TCTC-9496] #1813

feat(mongo): when creating a df from a cursor, allow to do it by chunks (saves memory) [TCTC-9496] #1813

Conversation

fspot commented Nov 8, 2024

davinov Nov 12, 2024

Choose a reason for hiding this comment

fspot Nov 12, 2024

Choose a reason for hiding this comment

davinov Nov 12, 2024

Choose a reason for hiding this comment

lukapeschke left a comment

Choose a reason for hiding this comment