-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Dremio as a destination #1026
Conversation
# Conflicts: # poetry.lock # pyproject.toml
✅ Deploy Preview for dlt-hub-docs canceled.
|
@maxfirman this is amazing! we'll assign a reviewer tomorrow and add tests to our CI jobs after merging |
tests/utils.py
Outdated
@@ -60,7 +61,7 @@ | |||
|
|||
# filter out active destinations for current tests | |||
ACTIVE_DESTINATIONS = set(dlt.config.get("ACTIVE_DESTINATIONS", list) or IMPLEMENTED_DESTINATIONS) | |||
|
|||
ACTIVE_DESTINATIONS = {"dremio"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self: remove!
small cleanups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxfirman @sh-rp this is so good! this destination was quite different from the other but you were able to fit it perfectly into existing model and test structure (thx for making tests working)
I opened a PR that will be used that the newest code works #1194
We upgraded how we deal with configurations, parquet files and redid our CI structure so when all tests in the branch above are passing I'll push the newest version here and then we merge it.
IMO this destination is really good even now (maybe I'm missing something that dremio does and we do not support :))
- we already have ticket to start using MERGE SQL statements
- adding gcp and azure blob storages are mostly matter of config and we can do that later
- using a sentinel table for destinations that have no schemas/datasets actually makes sense. we do that for vector databases. we also support empty dataset names so tables are not prefixed (not sure it applies here after all)
- we can implement dremio adapter to expose all the partition/clustering/retention features (some are implemented but adapters let users to add the hints conveniently to the resources)
I also found following comment in tests
# schemaless destinations allow adding of root key without the pipeline failing
# for now this is only the case for dremio
# doing this will result in somewhat useless behavior
destination_allows_adding_root_key = destination_config.destination == "dremio"
tbh. to me it looks like dremio does not enforce nulls and allows to add root_id
column even if it is NOT NULL to an existing table with the data. @maxfirman could that be the case?
anyway thanks again for your amazing work!
@rudolfix thanks for time and effort reviewing this PR! Indeed Dremio does not support I agree with all of your points. I have a few other thoughts as well for follow up PRs:
|
@maxfirman one question to:
we will be able to test against dremio cloud right? this "NAS" is also available there - the "local" filesystem is in the cloud then, right? also: #1199 |
@rudolfix I'm not 100% sure as I'm only familiar with their on-prem offering. If we are running tests against Dremio Cloud it may make sense to use the built in "Arctic" catalog (which is basically managed Nessie) instead. Nessie can be run in a docker container, so we could also extend the docker-compose.yaml to stand up a Nessie server if we wanted to test this out against a Dockerised deployment. |
Description
This PR adds Dremio as a destination to DLT
Additional Context
Some details for future PRs:
merge
write disposition usingMERGE INTO