Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dremio as a destination #1026

Merged
merged 74 commits into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
f0bd71c
Add docker-compose.yml for Dremio
maxfirman Feb 19, 2024
b96cafb
bootstrap dremio in docker-compose.yml
Feb 20, 2024
4115ca9
refactor dremio bootstrap
Feb 20, 2024
34cd79f
Add dremio client dependency
maxfirman Feb 21, 2024
31551d4
test adbc from separate container
Feb 21, 2024
55287b6
Add pydremio db api implementation
Feb 22, 2024
e06a5da
Further development
Feb 22, 2024
0404cba
Initial dremio test
Feb 22, 2024
020fdad
Add description and rowcount to pydremio
Feb 22, 2024
a8ed429
Initial INSERT working
Feb 22, 2024
49c0bd9
Passing test
Feb 23, 2024
6df11ea
Clean up test
Feb 23, 2024
4b59e8e
Fixup some more issues
Feb 26, 2024
174bd0d
Inject data source configuration
Feb 26, 2024
723577d
Inject data source configuration
Feb 26, 2024
653959f
Add flatten logic
Feb 26, 2024
d6e013e
pyproject.toml
Feb 26, 2024
a98d9ee
Fix pyproject.toml
maxfirman Feb 26, 2024
e53d250
Fix Dockerfile
maxfirman Feb 26, 2024
885ff2a
Fix a couple of problems
Feb 27, 2024
2d7acfc
Tidy up
Feb 27, 2024
0214ebb
Merge branch 'devel' into add-dremio
Feb 27, 2024
2974945
Add dremio.md
Feb 28, 2024
740975f
Fix supported file formats in capabilities
Feb 28, 2024
77596f2
Add code to handle partition and localsort
Feb 28, 2024
7e16129
Add some tests around PARTITION and LOCALSORT
Feb 28, 2024
85fa640
Add some docs for partitions
Feb 28, 2024
413ce24
Merge branch 'devel' into add-dremio
Feb 28, 2024
6aecb62
Update poetry.lock and fix lint errors
maxfirman Feb 28, 2024
add4431
Use DOUBLE instead of FLOAT
maxfirman Mar 8, 2024
dd6046d
Fix a few more tests
maxfirman Mar 8, 2024
3260f12
Override CREATE TEMP TABLE queries as Dremio does not support TEMP ta…
maxfirman Mar 8, 2024
38cc312
Credit the original code in pydremio and reproduce Apache2 license.
maxfirman Mar 8, 2024
fee8709
Merge branch 'devel' into add-dremio
maxfirman Mar 8, 2024
36460f4
poetry.lock
maxfirman Mar 8, 2024
bbeac9b
Refactor sqlalchemy ULR import
maxfirman Mar 8, 2024
3497138
Fix stage loading test
maxfirman Mar 8, 2024
b3a01a8
Fix stage loading test
maxfirman Mar 8, 2024
6a9063f
Fix lint issues
maxfirman Mar 8, 2024
e6340e7
Ensure all standard tests are run and start fixing failures
maxfirman Mar 15, 2024
47c6dde
Fix COPY INTO command
maxfirman Mar 15, 2024
ae5b4df
Escape "value"
maxfirman Mar 15, 2024
f28d50f
More fixes
maxfirman Mar 18, 2024
a911f63
More fixes
maxfirman Mar 18, 2024
4ebc528
Only two failing tests left
maxfirman Mar 18, 2024
029251c
1 Test failing
maxfirman Mar 18, 2024
8975cd4
Remove the flatten functionality
maxfirman Mar 18, 2024
3d07a67
Fix lint
maxfirman Mar 18, 2024
80478a0
remove data_source config option
maxfirman Mar 18, 2024
3213a70
Add some verbiage around the lack of CREATE SCHEMA
maxfirman Mar 18, 2024
5ff9793
Some fixes and add Dremio to staging destination configs
maxfirman Mar 22, 2024
c27c5cb
Remove staging_credentials from DremioLoadJob
maxfirman Mar 22, 2024
e61281a
Remove staging_credentials from DremioLoadJob
maxfirman Mar 22, 2024
4e6cda1
Merge branch 'devel' into add-dremio
sh-rp Mar 27, 2024
f7c26e9
update lockfile post merge
sh-rp Mar 27, 2024
8b50677
add dremio test workflow
sh-rp Mar 27, 2024
617b009
fixing dremio tests
sh-rp Mar 27, 2024
6483e0c
fix docs code section types
sh-rp Mar 27, 2024
e1868ae
fix post devel merge linting errors
sh-rp Mar 27, 2024
3d83299
ignore callarg for dremio config test
sh-rp Mar 27, 2024
e6ae88e
Fix test_dremio_client.py
maxfirman Mar 27, 2024
a313639
make minio setup sleep a bit
sh-rp Apr 2, 2024
af21efd
fix remaining test
sh-rp Apr 2, 2024
2f4e926
small refactor of sql job
sh-rp Apr 2, 2024
3a7029c
remove unneeded statement
sh-rp Apr 2, 2024
4b07630
mark dremio as experimental
sh-rp Apr 2, 2024
b596ed8
reset active destinations
sh-rp Apr 2, 2024
b0434a6
revert client change and update test
sh-rp Apr 2, 2024
ba01251
fix default order by
sh-rp Apr 2, 2024
810979d
Merge branch 'devel' into max-add-dremio
rudolfix Apr 7, 2024
c83a202
merge fixes, dremio factory test
rudolfix Apr 7, 2024
73e75d1
configures dremio pipeline tests properly
rudolfix Apr 8, 2024
189c96c
upgrades dremio ci workflow
rudolfix Apr 8, 2024
db91346
fixes local destinations ci workflow
rudolfix Apr 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions .github/workflows/test_destination_dremio.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@

name: test | dremio

on:
pull_request:
branches:
- master
- devel
workflow_dispatch:
schedule:
- cron: '0 2 * * *'

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

env:
RUNTIME__SENTRY_DSN: https://[email protected]/4504819859914752
RUNTIME__LOG_LEVEL: ERROR

ACTIVE_DESTINATIONS: "[\"dremio\"]"
ALL_FILESYSTEM_DRIVERS: "[\"memory\"]"

jobs:
get_docs_changes:
name: docs changes
uses: ./.github/workflows/get_docs_changes.yml
if: ${{ !github.event.pull_request.head.repo.fork || contains(github.event.pull_request.labels.*.name, 'ci from fork')}}

run_loader:
name: test | dremio tests
needs: get_docs_changes
if: needs.get_docs_changes.outputs.changes_outside_docs == 'true'
defaults:
run:
shell: bash
runs-on: "ubuntu-latest"

steps:

- name: Check out
uses: actions/checkout@master

- name: Start dremio
run: docker-compose -f "tests/load/dremio/docker-compose.yml" up -d

- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: "3.10.x"

- name: Install Poetry
uses: snok/[email protected]
with:
virtualenvs-create: true
virtualenvs-in-project: true
installer-parallel: true

- name: Load cached venv
id: cached-poetry-dependencies
uses: actions/cache@v3
with:
path: .venv
key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}-gcp

- name: Install dependencies
run: poetry install --no-interaction -E s3 -E gs -E az -E parquet --with sentry-sdk --with pipeline

- run: |
poetry run pytest tests/load
if: runner.os != 'Windows'
name: Run tests Linux/MAC
env:
DESTINATION__DREMIO__CREDENTIALS: grpc://dremio:dremio123@localhost:32010/nas
DESTINATION__DREMIO__STAGING_DATA_SOURCE: minio
DESTINATION__FILESYSTEM__BUCKET_URL: s3://dlt-ci-test-bucket
DESTINATION__FILESYSTEM__CREDENTIALS__AWS_ACCESS_KEY_ID: minioadmin
DESTINATION__FILESYSTEM__CREDENTIALS__AWS_SECRET_ACCESS_KEY: minioadmin
DESTINATION__FILESYSTEM__CREDENTIALS__ENDPOINT_URL: http://127.0.0.1:9010

- run: |
poetry run pytest tests/load
if: runner.os == 'Windows'
name: Run tests Windows
shell: cmd

- name: Stop dremio
if: always()
run: docker-compose -f "tests/load/dremio/docker-compose.yml" down -v
6 changes: 5 additions & 1 deletion .github/workflows/test_local_destinations.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,18 @@ concurrency:
cancel-in-progress: true

env:
DLT_SECRETS_TOML: ${{ secrets.DLT_SECRETS_TOML }}
# NOTE: this workflow can't use github secrets!
# DLT_SECRETS_TOML: ${{ secrets.DLT_SECRETS_TOML }}

RUNTIME__SENTRY_DSN: https://[email protected]/4504819859914752
RUNTIME__LOG_LEVEL: ERROR
RUNTIME__DLTHUB_TELEMETRY_SEGMENT_WRITE_KEY: TLJiyRkGVZGCi2TtjClamXpFcxAA1rSB
ACTIVE_DESTINATIONS: "[\"duckdb\", \"postgres\", \"filesystem\", \"weaviate\"]"
ALL_FILESYSTEM_DRIVERS: "[\"memory\", \"file\"]"

DESTINATION__WEAVIATE__VECTORIZER: text2vec-contextionary
DESTINATION__WEAVIATE__MODULE_CONFIG: "{\"text2vec-contextionary\": {\"vectorizeClassName\": false, \"vectorizePropertyName\": true}}"

jobs:
get_docs_changes:
name: docs changes
Expand Down
1 change: 1 addition & 0 deletions dlt/common/data_writers/escape.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ def escape_redshift_identifier(v: str) -> str:

escape_postgres_identifier = escape_redshift_identifier
escape_athena_identifier = escape_postgres_identifier
escape_dremio_identifier = escape_postgres_identifier


def escape_bigquery_identifier(v: str) -> str:
Expand Down
2 changes: 2 additions & 0 deletions dlt/destinations/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from dlt.destinations.impl.destination.factory import destination
from dlt.destinations.impl.synapse.factory import synapse
from dlt.destinations.impl.databricks.factory import databricks
from dlt.destinations.impl.dremio.factory import dremio


__all__ = [
Expand All @@ -30,5 +31,6 @@
"weaviate",
"synapse",
"databricks",
"dremio",
"destination",
]
27 changes: 27 additions & 0 deletions dlt/destinations/impl/dremio/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
from dlt.common.arithmetics import DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE
from dlt.common.data_writers.escape import escape_dremio_identifier
from dlt.common.destination import DestinationCapabilitiesContext


def capabilities() -> DestinationCapabilitiesContext:
caps = DestinationCapabilitiesContext()
caps.preferred_loader_file_format = None
caps.supported_loader_file_formats = []
caps.preferred_staging_file_format = "parquet"
caps.supported_staging_file_formats = ["jsonl", "parquet"]
caps.escape_identifier = escape_dremio_identifier
caps.decimal_precision = (DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE)
caps.wei_precision = (DEFAULT_NUMERIC_PRECISION, 0)
caps.max_identifier_length = 255
caps.max_column_identifier_length = 255
caps.max_query_length = 2 * 1024 * 1024
caps.is_max_query_length_in_bytes = True
caps.max_text_data_type_length = 16 * 1024 * 1024
caps.is_max_text_data_type_length_in_bytes = True
caps.supports_transactions = False
caps.supports_ddl_transactions = False
caps.alter_add_multi_column = True
caps.supports_clone_table = False
caps.supports_multiple_statements = False
caps.timestamp_precision = 3
return caps
43 changes: 43 additions & 0 deletions dlt/destinations/impl/dremio/configuration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import dataclasses
from typing import Final, Optional, Any, Dict, ClassVar, List

from dlt.common.configuration import configspec
from dlt.common.configuration.specs import ConnectionStringCredentials
from dlt.common.destination.reference import DestinationClientDwhWithStagingConfiguration
from dlt.common.libs.sql_alchemy import URL
from dlt.common.typing import TSecretStrValue
from dlt.common.utils import digest128


@configspec(init=False)
class DremioCredentials(ConnectionStringCredentials):
drivername: str = "grpc"
username: str = None
password: TSecretStrValue = None
host: str = None
port: Optional[int] = 32010
database: str = None

__config_gen_annotations__: ClassVar[List[str]] = ["port"]

def to_native_credentials(self) -> str:
return URL.create(
drivername=self.drivername, host=self.host, port=self.port
).render_as_string(hide_password=False)

def db_kwargs(self) -> Dict[str, Any]:
return dict(username=self.username, password=self.password)


@configspec
class DremioClientConfiguration(DestinationClientDwhWithStagingConfiguration):
destination_type: Final[str] = dataclasses.field(default="dremio", init=False, repr=False, compare=False) # type: ignore[misc]
credentials: DremioCredentials = None
staging_data_source: str = None
"""The name of the staging data source"""

def fingerprint(self) -> str:
"""Returns a fingerprint of host part of a connection string"""
if self.credentials and self.credentials.host:
return digest128(self.credentials.host)
return ""
Loading
Loading