-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-threading issues on information_schema queries #43
Comments
I could reproduce starting from this adapter's first release (1.0.4): https://github.com/dbt-athena/dbt-athena/pull/7/files I made a local change and reverted one line in handle = AthenaConnection(
s3_staging_dir=creds.s3_staging_dir,
endpoint_url=creds.endpoint_url,
schema_name=creds.schema,
work_group=creds.work_group,
cursor_class=AthenaCursor,
formatter=AthenaParameterFormatter(),
poll_interval=creds.poll_interval,
- session=get_boto3_session(connection),
+ profile_name=creds.aws_profile_name,
retry_config=RetryConfig(
attempt=creds.num_retries,
exceptions=(
"ThrottlingException",
"TooManyRequestsException",
"InternalServerException",
),
),
) 💡 If this only happens with the |
Could be that the regression was introduced by |
I have a branch with the Glue API implemented instead of Found this to be very interesting:
The tl;dr is
So you need one session per thread. We use 1 global session right now, which is causing the multi-threading errors: https://github.com/dbt-athena/dbt-athena/blob/main/dbt/adapters/athena/session.py#L9-L15C10 Which was introduced in Tomme/dbt-athena#125 import boto3.session
from dbt.contracts.connection import Connection
def get_boto3_session(connection: Connection = None) -> boto3.session.Session:
if connection is None:
raise RuntimeError(
"A Connection object needs to be passed to initialize the boto3 session"
)
return boto3.session.Session(
region_name=connection.credentials.region_name,
profile_name=connection.credentials.aws_profile_name,
) It still respects the |
I'm switching a few projects to this dbt-athena community adapter and I notice a regression bug 🐛
Locally, I use
threads: 4
. I have a few databases defined in mydbt_project.yml
:When I run
dbt --debug run
I see it starts by making 3 parallel queries toINFORMATION_SCHEMA
(corresponding to the 3 custom schemas)Output of the failing `dbt --debug run`:
When I switch back to Tomme's adapter (
dbt-athena-adapter==1.0.1
), It makes three queries in parallel successfully.Output of the failing `dbt --debug run`:
When I use
threads: 1
orthreads: 2
it works and successfully deploys the dbt project, but it starts failing with 3 or more threads.This might be a regression? I found a similar issue Tomme/dbt-athena#41
The text was updated successfully, but these errors were encountered: