-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executemany has an option for 'bulk'-insert on Impala #96
base: master
Are you sure you want to change the base?
Conversation
I'm not sure I understand what this does. Could you explain it in a bit more detail? |
The idea here is to INSERT data in big chunks instead of doing it row-by-row. |
Ah, I was unfamiliar with That said, I'm generally uneasy about having impyla rewrite people's queries, and anyway, using an |
In my project, I'm using impyla as one of several database drivers which could be accessed via DB API 2.0. Users should be able to upload/insert small datasets/tables (probably 100 rows max) into the database. Right now this will produce hundreds of files in HDFS while this PR allows to avoid it. According to PEP 249:
Impala docs also give following recommendations on
I didn't have a chance to look at ibis until now. Interestingly it uses impyla and hdfs (which my project is based on) and bunch of other packages underneath. One of the goals of my project is to make it lean and preferably pure (and that's #91 is for). |
Ok, as long as the default is the same, shouldn't be a problem. I'll make some additional comments for changes as well. |
@@ -25,6 +25,8 @@ | |||
IntegrityError, DataError, NotSupportedError) | |||
|
|||
|
|||
RE_INSERT_VALUES = re.compile(r"(.*\binsert\b.*\bvalues\b\s*)(\(.*\))\s*;?\s*", flags=re.IGNORECASE) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: insert addl line per PEP8
@laserson Can you merge this change into master? |
I'm no longer involved with this project. Try a more recent committer. |
Sorry this got neglected. This is interesting but I'm unsure if the interface is quite right. Is there a reason not to do the rewrite transparently? |
Hi I achieved the same thing by this #460 and it is working well. Using allToBeInserted.to_sql('xx', engine, if_exists='append', index=False,chunksize=2000,method='multi') |
Impala supports multiple row inserts. This pull request adds option to use this feature.