Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support per-database configuration in sql_database and sql_table sources #2114

Open
trymzet opened this issue Dec 2, 2024 · 3 comments
Open
Assignees
Labels
question Further information is requested

Comments

@trymzet
Copy link
Contributor

trymzet commented Dec 2, 2024

Feature description

Currently, you must either have one sql_database config, or per-pipeline configs. This is impractical if ingesting from multiple databases and having possibly many (eg. if creating a pipeline per table or per schema) pipelines, which AFAIK is a common use case.

(BTW I think there's the same issue with eg. the filesystem destination, which also only works if you only load data to one filesystem.)

Are you a dlt user?

None

Use case

No response

Proposed solution

Would love to be able to do something like

# secrets.toml
[sources.sql_database.my_db.credentials]
...

[sources.sql_database.my_other_db.credentials]
...

And then in the code you have sth like

source = sql_table(database="my_db", schema="my_schema", table="my_table")
pipeline = dlt.pipeline()
pipeline.run(source)

Related issues

No response

@rudolfix rudolfix moved this from Todo to In Progress in dlt core library Dec 9, 2024
@rudolfix
Copy link
Collaborator

rudolfix commented Dec 9, 2024

@trymzet you should be able to get what you need by giving each sql_database instance a name that is ie. your database name. for example

source = sql_database(...).with_args(name="my_database", section="my_database")

and then

[sources.my_database.credentials]

you could also tryi

source = sql_database(...).with_args(name="my_database")

and

[sources.sql_database.my_database.credentials]

https://dlthub.com/docs/api_reference/extract/source#with_args

@rudolfix rudolfix self-assigned this Dec 9, 2024
@rudolfix rudolfix added the question Further information is requested label Dec 9, 2024
@trymzet
Copy link
Contributor Author

trymzet commented Dec 9, 2024

@rudolfix it's still looking under sql_database in the config, it's probably because the config is being resolved when sql_database() is called (so before with_args())

db = sql_database(schema="my_schema", backend="pyarrow").with_args(
    name="mssql", section="mssql"
)
dlt.common.configuration.exceptions.ConfigFieldMissingException: Following fields are missing: ['drivername'] in configuration with spec ConnectionStringCredentials
        for field "drivername" config providers and keys were tried in following order:
                In Environment Variables key SOURCES__SQL_DATABASE__SQL_DATABASE__CREDENTIALS__DRIVERNAME was not found.
                In Environment Variables key SOURCES__SQL_DATABASE__CREDENTIALS__DRIVERNAME was not found.
                In Environment Variables key SOURCES__CREDENTIALS__DRIVERNAME was not found.
                In Environment Variables key CREDENTIALS__DRIVERNAME was not found.
                In secrets.toml key sources.sql_database.sql_database.credentials.drivername was not found.
                In secrets.toml key sources.sql_database.credentials.drivername was not found.
                In secrets.toml key sources.credentials.drivername was not found.
                In secrets.toml key credentials.drivername was not found.
                In config.toml key sources.sql_database.sql_database.credentials.drivername was not found.
                In config.toml key sources.sql_database.credentials.drivername was not found.
                In config.toml key sources.credentials.drivername was not found.
                In config.toml key credentials.drivername was not found.

Also, is there a way to pass these when using sql_table()?

@rudolfix
Copy link
Collaborator

@trymzet OK I gave you a wrong example :) as you say - you need to rename the source before it is called. this one works:

from dlt.sources import sql_database

renamed = sql_database.sql_database.with_args(name="my_db", section="my_db")(table_names=["table_1"])

in case of standalone sql_table: the resource name is set to table name and configured as such. this could be indeed improved by implementing similar mechanism we have for sources... currently the config sections for sql_table is
[sources.sql_database.<table name>]. so for your case it is better to go with sql_database

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
Status: In Progress
Development

No branches or pull requests

2 participants