Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sqlalchemy destination #1734

Merged
merged 37 commits into from
Sep 14, 2024
Merged

Sqlalchemy destination #1734

merged 37 commits into from
Sep 14, 2024

Conversation

steinitzu
Copy link
Collaborator

Draft Sqlalchemy loader. Basic things work so far.

Description

  • Mysql generally works.
  • Sqlite currently only works with dataset name main.
  • Support parquet and typed-jsonl file formats
  • Implements SqlJobClient interface so most/all sql loader tests should run (many already run against mysql)

What's left:

  • Override sql_client methods when sqlite is used. create/drop dataset works differently
  • Limit query length and number of params sent per statement in load jobs
  • Test bulk insert
  • Test with sqlite and mysql (maybe other built in dialects that can run locally postgres/mssql)
  • Test table reflection and table building logic
  • Test sqlalchemy 1.4 and 2 in CI
  • Lots of cleanup

Related Issues

Additional Context

Copy link

netlify bot commented Aug 23, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit dc4c29c
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/66e48dc764ccee0008ac6e82

@steinitzu steinitzu force-pushed the sqlalchemy-loader branch 2 times, most recently from aee8ada to 1e7fa6a Compare August 31, 2024 01:45
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I was expecting that we need some kind of settings per dialect (ie DialectCapabilities) so we have optimized inserts. but it seems it is not necessary.

There are a lot of standard tests where this thing should work:

  • restore state tests
  • drop command / refresh mode tests
  • all pipeline tests that do not requires merge write disposition

if we enable them we are good.

MERGE write disposition looks complicated, right? We surely can generate all required statements (we have only SELECT / INSERT / DELETE) but creating temporary tables looks like a stretch. Still we can do the Athena trick and create real tables for temporary data

A REPLACE mode that uses staging dataset would be also cool.

maybe those two could go to a separate ticket

def _create_date_time_type(self, sc_t: str, precision: Optional[int]) -> sa.types.TypeEngine:
"""Use the dialect specific datetime/time type if possible since the generic type doesn't accept precision argument"""
precision = precision if precision is not None else self.capabilities.timestamp_precision
if sc_t == "timestamp":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we support datetimes without timezones now see #1492

dlt/destinations/impl/sqlalchemy/db_api_client.py Outdated Show resolved Hide resolved
@steinitzu steinitzu force-pushed the sqlalchemy-loader branch 3 times, most recently from 792f7a1 to b9b317a Compare September 7, 2024 00:43
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please merge current devel. all big changes for 1.0 are in. this looks really good!

what about merges? IMO our standard set of sql jobs will work just fine with mysql and sqllite. maybe you could try that out?

dlt/common/destination/reference.py Outdated Show resolved Hide resolved
from typing import List

import sqlalchemy as sa
from alembic.runtime.migration import MigrationContext
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this is not an overkill... alembic deps look quite minimal so probably OK

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah would have liked to avoid this, but at least it was less complicated than I expected since alembic is not really used like this normally.

dlt/destinations/impl/sqlalchemy/sqlalchemy_job_client.py Outdated Show resolved Hide resolved
dlt/destinations/impl/sqlalchemy/sqlalchemy_job_client.py Outdated Show resolved Hide resolved
remove secrets toml

remove secrets toml

Revert "remove secrets toml"

This reverts commit 7dd189c.

Fix default pipeline name test
@steinitzu steinitzu marked this pull request as ready for review September 12, 2024 01:30
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls see comments on engine.dispose

self.engine = credentials.engine
self.external_engine = True
else:
self.engine = sa.create_engine(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we create engine when first connection is created? sometimes sql client is created to just use a few internal functions and never opens the connection

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steinitzu all good
but we do not need

self.engine.dispose()

we have null pool. so it will dispose of external engines which should not happen. WDYT?

@steinitzu
Copy link
Collaborator Author

@steinitzu all good but we do not need

self.engine.dispose()

we have null pool. so it will dispose of external engines which should not happen. WDYT?

Hmm I have the the flag for external engine so it's not closed:
https://github.com/dlt-hub/dlt/blob/sqlalchemy-loader/dlt/destinations/impl/sqlalchemy/db_api_client.py#L115-L117

But yeah not sure if there's any need for it since the connection is closed anyway. Maybe in the special case where you override poolclass?

@rudolfix
Copy link
Collaborator

@steinitzu all good but we do not need

self.engine.dispose()

we have null pool. so it will dispose of external engines which should not happen. WDYT?

Hmm I have the the flag for external engine so it's not closed: https://github.com/dlt-hub/dlt/blob/sqlalchemy-loader/dlt/destinations/impl/sqlalchemy/db_api_client.py#L115-L117

But yeah not sure if there's any need for it since the connection is closed anyway. Maybe in the special case where you override poolclass?

heh you are right. I overlooked the external flag. then all good

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rudolfix rudolfix merged commit 9580baf into devel Sep 14, 2024
61 checks passed
@rudolfix rudolfix deleted the sqlalchemy-loader branch September 14, 2024 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

implement sql alchemy destination
2 participants