Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incremental strategy not inserting #177

Open
pmheld opened this issue Jun 30, 2023 · 8 comments
Open

incremental strategy not inserting #177

pmheld opened this issue Jun 30, 2023 · 8 comments

Comments

@pmheld
Copy link

pmheld commented Jun 30, 2023

Hi,
I have dbt-impala==1.4.0 and dbt-core==1.4.1 installed and experience a problem with a model that should use materialized='incremental'.
The issue is that this single model in the project is initially running fine since no table existed in the Cloudera CDP (Impala) but the second time that I'm running dbt it tries to create the table that already exists rather than doing an insert respectively an append to the table.

This is what dbt show while running:

10:27:23  Running with dbt=1.4.1
10:27:23  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 317 macros, 0 operations, 0 seed files, 4 sources, 0 exposures, 0 metrics
10:27:23  
10:27:32  Concurrency: 3 threads (target='impala')
10:27:32  
10:27:32  1 of 1 START sql incremental model econda.a_econda_basis_v2 .................... [RUN]
10:27:33  1 of 1 ERROR creating sql incremental model econda.a_econda_basis_v2 ........... [ERROR in 0.36s]
10:27:33  
10:27:33  Finished running 1 incremental model in 0 hours 0 minutes and 9.36 seconds (9.36s).
10:27:33  
10:27:33  Completed with 1 error and 0 warnings:
10:27:33  
10:27:33  Runtime Error in model a_econda_basis_v2 (models/a_econda_basis_v2.sql)
10:27:33    Unable to establish connection to Impala server: AnalysisException: Table already exists: econda.a_econda_basis_v2
10:27:33    
10:27:33  
10:27:33  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

The project.yml looks like this:

name: 'econda'
version: '1.0.0'
config-version: 2
profile: 'monitoring'

model-paths: ["models"]
test-paths: ["tests"]
macro-paths: ["macros"]

target-path: "target"  
clean-targets:         
  - "target"
  - "dbt_packages"

models:
  econda:
    +materialized: table
    +schema: econda

The model looks like this:

{{
    config(
        materialized='incremental',
        incremental_strategy='append',
        partition_by=['sourcemonth']
    )
}}
    SELECT 
        cast(request_id AS VARCHAR(12)) AS request_id,
        `timestamp` AS time_stamp,
        to_date(`timestamp`) AS date_request,
        viewtime_s,
        url,
        oc.icamp,
        CASE
                WHEN oc.icamptype = '2' THEN 'clicked'
                WHEN oc.icamptype = '1' THEN 'seen'
                WHEN oc.icamptype = '0' THEN 'last_seen'
        ELSE NULL
        END AS icamptype,
        RANK () OVER (PARTITION BY v3.visit_id ORDER BY `timestamp` ASC) Request_Rank,
        v3.sourcedate,
        v3.sourcemonth
    FROM {{source('econda', 'co_econda_rohdaten_v3_ohne_events')}} v3
    LEFT JOIN {{source('econda', 'co_econda_rohdaten_lexwared_3')}} e3 ON v3.request_id = e3.requestid
    LEFT JOIN {{source('econda', 'co_econda_rohdaten_onsite_campaign')}} oc ON v3.request_id = oc.requestid

{% if is_incremental() %}

    where v3.`timestamp` >= date_add(current_date(),-1)
    and e3.sourcedate >= 20230629 
    and oc.sourcedate >= 20230629 

{% endif %}

This is converted into:

  create  table
    econda.a_econda_basis_v2
      partitioned by (sourcemonth)
  as
    SELECT 
        cast(request_id AS VARCHAR(12)) AS request_id,
        `timestamp` AS time_stamp,
        to_date(`timestamp`) AS date_request,
        viewtime_s,
        oc.icamp,
        CASE
                WHEN oc.icamptype = '2' THEN 'clicked'
                WHEN oc.icamptype = '1' THEN 'seen'
                WHEN oc.icamptype = '0' THEN 'last_seen'
        ELSE NULL
        END AS icamptype,
        RANK () OVER (PARTITION BY v3.visit_id ORDER BY `timestamp` ASC) Request_Rank,
        v3.sourcedate,
        v3.sourcemonth
    FROM econda.co_econda_rohdaten_v3_ohne_events v3
    LEFT JOIN econda.co_econda_rohdaten_lexwared_3 e3 ON v3.request_id = e3.requestid
    LEFT JOIN econda.co_econda_rohdaten_onsite_campaign oc ON v3.request_id = oc.requestid

From what I've seen in the 860 lines of the log file this part migth be interesting:

[0m10:27:32.699522 [debug] [MainThread]: Impala adapter: NotImplemented: add_begin_query
�[0m10:27:32.700014 [debug] [MainThread]: Impala adapter: NotImplemented: commit
�[0m10:27:32.700679 [info ] [MainThread]: Concurrency: 3 threads (target='impala')
�[0m10:27:32.700964 [info ] [MainThread]: 
�[0m10:27:32.722855 [debug] [Thread-145]: Began running node model.econda.a_econda_basis_v2
�[0m10:27:32.723378 [info ] [Thread-145]: 1 of 1 START sql incremental model econda.a_econda_basis_v2 .................... [RUN]
�[0m10:27:32.724054 [debug] [Thread-145]: Acquiring new impala connection 'model.econda.a_econda_basis_v2'
�[0m10:27:32.724702 [debug] [Thread-145]: Began compiling node model.econda.a_econda_basis_v2
�[0m10:27:32.735790 [debug] [Thread-145]: Writing injected SQL for node "model.econda.a_econda_basis_v2"
�[0m10:27:32.736417 [debug] [Thread-145]: Timing info for model.econda.a_econda_basis_v2 (compile): 2023-06-30 10:27:32.724942 => 2023-06-30 10:27:32.736311
�[0m10:27:32.736985 [debug] [Thread-145]: Began executing node model.econda.a_econda_basis_v2
�[0m10:27:32.767536 [debug] [Thread-145]: Tracker adapter: Usage tracking flag True. To turn on/off use usage_tracking flag in profiles.yml
�[0m10:27:32.768222 [debug] [Thread-151]: Tracker adapter: Sending Event {'data': '{"event_type": "model_access", "model_name": "econda.a_econda_basis_v2", "model_type": "table", "incremental_strategy": "", "auth": "N/A", "connection_state": "N/A", "elapsed_time": "N/A", "permissions": "N/A", "profile_name": "N/A", "sql_type": "N/A", "id": "6f580870-1459-4081-a126-fd1ed5cbeb18", "unique_host_hash": "6684ca3f30d055ca36911ca432eb2347", "unique_user_hash": "68c4283db8074b12df1660b31c0220a9", "unique_session_hash": "60313cc0204ea2f349b15a2fa6f4abd0", "python_version": "3.8.16", "system": "Linux", "machine": "x86_64", "platform": "Linux-5.4.0-107-generic-x86_64-with-glibc2.2.5", "dbt_version": "1.4.1", "dbt_adapter_type": "impala", "dbt_adapter_version": "1.4.0", "dbt_deployment_env": {}, "project_name": "monitoring", "target_name": "impala", "no_of_threads": 3, "warehouse_version": {"version": "impalad version 3.4.0-SNAPSHOT", "build": "impalad version 3.4.0-SNAPSHOT RELEASE (build 75494b221421452d0b928d844e04afa70654d528)\\nBuilt on Wed Feb  8 15:18:37 UTC 2023"}}'}
�[0m10:27:32.772748 [debug] [Thread-145]: Tracker adapter: Usage tracking flag True. To turn on/off use usage_tracking flag in profiles.yml
�[0m10:27:32.773557 [debug] [Thread-152]: Tracker adapter: Sending Event {'data': '{"event_type": "model_access", "model_name": "econda.a_econda_basis_v2__dbt_tmp", "model_type": "table", "incremental_strategy": "", "auth": "N/A", "connection_state": "N/A", "elapsed_time": "N/A", "permissions": "N/A", "profile_name": "N/A", "sql_type": "N/A", "id": "6f580870-1459-4081-a126-fd1ed5cbeb18", "unique_host_hash": "6684ca3f30d055ca36911ca432eb2347", "unique_user_hash": "68c4283db8074b12df1660b31c0220a9", "unique_session_hash": "60313cc0204ea2f349b15a2fa6f4abd0", "python_version": "3.8.16", "system": "Linux", "machine": "x86_64", "platform": "Linux-5.4.0-107-generic-x86_64-with-glibc2.2.5", "dbt_version": "1.4.1", "dbt_adapter_type": "impala", "dbt_adapter_version": "1.4.0", "dbt_deployment_env": {}, "project_name": "monitoring", "target_name": "impala", "no_of_threads": 3, "warehouse_version": {"version": "impalad version 3.4.0-SNAPSHOT", "build": "impalad version 3.4.0-SNAPSHOT RELEASE (build 75494b221421452d0b928d844e04afa70654d528)\\nBuilt on Wed Feb  8 15:18:37 UTC 2023"}}'}
�[0m10:27:32.777193 [debug] [Thread-145]: Tracker adapter: Usage tracking flag True. To turn on/off use usage_tracking flag in profiles.yml
�[0m10:27:32.780691 [debug] [Thread-153]: Tracker adapter: Sending Event {'data': '{"event_type": "model_access", "model_name": "econda.a_econda_basis_v2__dbt_tmp", "model_type": "table", "incremental_strategy": "", "auth": "N/A", "connection_state": "N/A", "elapsed_time": "N/A", "permissions": "N/A", "profile_name": "N/A", "sql_type": "N/A", "id": "6f580870-1459-4081-a126-fd1ed5cbeb18", "unique_host_hash": "6684ca3f30d055ca36911ca432eb2347", "unique_user_hash": "68c4283db8074b12df1660b31c0220a9", "unique_session_hash": "60313cc0204ea2f349b15a2fa6f4abd0", "python_version": "3.8.16", "system": "Linux", "machine": "x86_64", "platform": "Linux-5.4.0-107-generic-x86_64-with-glibc2.2.5", "dbt_version": "1.4.1", "dbt_adapter_type": "impala", "dbt_adapter_version": "1.4.0", "dbt_deployment_env": {}, "project_name": "monitoring", "target_name": "impala", "no_of_threads": 3, "warehouse_version": {"version": "impalad version 3.4.0-SNAPSHOT", "build": "impalad version 3.4.0-SNAPSHOT RELEASE (build 75494b221421452d0b928d844e04afa70654d528)\\nBuilt on Wed Feb  8 15:18:37 UTC 2023"}}'}
�[0m10:27:32.785485 [debug] [Thread-145]: Tracker adapter: Usage tracking flag True. To turn on/off use usage_tracking flag in profiles.yml
�[0m10:27:32.787221 [debug] [Thread-154]: Tracker adapter: Sending Event {'data': '{"event_type": "model_access", "model_name": "econda.a_econda_basis_v2__dbt_backup", "model_type": "table", "incremental_strategy": "", "auth": "N/A", "connection_state": "N/A", "elapsed_time": "N/A", "permissions": "N/A", "profile_name": "N/A", "sql_type": "N/A", "id": "6f580870-1459-4081-a126-fd1ed5cbeb18", "unique_host_hash": "6684ca3f30d055ca36911ca432eb2347", "unique_user_hash": "68c4283db8074b12df1660b31c0220a9", "unique_session_hash": "60313cc0204ea2f349b15a2fa6f4abd0", "python_version": "3.8.16", "system": "Linux", "machine": "x86_64", "platform": "Linux-5.4.0-107-generic-x86_64-with-glibc2.2.5", "dbt_version": "1.4.1", "dbt_adapter_type": "impala", "dbt_adapter_version": "1.4.0", "dbt_deployment_env": {}, "project_name": "monitoring", "target_name": "impala", "no_of_threads": 3, "warehouse_version": {"version": "impalad version 3.4.0-SNAPSHOT", "build": "impalad version 3.4.0-SNAPSHOT RELEASE (build 75494b221421452d0b928d844e04afa70654d528)\\nBuilt on Wed Feb  8 15:18:37 UTC 2023"}}'}
�[0m10:27:32.796155 [debug] [Thread-145]: Tracker adapter: Usage tracking flag True. To turn on/off use usage_tracking flag in profiles.yml
�[0m10:27:32.806767 [debug] [Thread-155]: Tracker adapter: Sending Event {'data': '{"event_type": "incremental", "model_name": "econda.a_econda_basis_v2", "model_type": "table", "incremental_strategy": "append", "auth": "N/A", "connection_state": "N/A", "elapsed_time": "N/A", "permissions": "N/A", "profile_name": "N/A", "sql_type": "N/A", "id": "6f580870-1459-4081-a126-fd1ed5cbeb18", "unique_host_hash": "6684ca3f30d055ca36911ca432eb2347", "unique_user_hash": "68c4283db8074b12df1660b31c0220a9", "unique_session_hash": "60313cc0204ea2f349b15a2fa6f4abd0", "python_version": "3.8.16", "system": "Linux", "machine": "x86_64", "platform": "Linux-5.4.0-107-generic-x86_64-with-glibc2.2.5", "dbt_version": "1.4.1", "dbt_adapter_type": "impala", "dbt_adapter_version": "1.4.0", "dbt_deployment_env": {}, "project_name": "monitoring", "target_name": "impala", "no_of_threads": 3, "warehouse_version": {"version": "impalad version 3.4.0-SNAPSHOT", "build": "impalad version 3.4.0-SNAPSHOT RELEASE (build 75494b221421452d0b928d844e04afa70654d528)\\nBuilt on Wed Feb  8 15:18:37 UTC 2023"}}'}
�[0m10:27:32.836716 [debug] [Thread-145]: Tracker adapter: Usage tracking flag True. To turn on/off use usage_tracking flag in profiles.yml
�[0m10:27:32.837476 [debug] [Thread-156]: Tracker adapter: Sending Event {'data': '{"event_type": "model_access", "model_name": "econda.a_econda_basis_v2", "model_type": "table", "incremental_strategy": "", "auth": "N/A", "connection_state": "N/A", "elapsed_time": "N/A", "permissions": "N/A", "profile_name": "N/A", "sql_type": "N/A", "id": "6f580870-1459-4081-a126-fd1ed5cbeb18", "unique_host_hash": "6684ca3f30d055ca36911ca432eb2347", "unique_user_hash": "68c4283db8074b12df1660b31c0220a9", "unique_session_hash": "60313cc0204ea2f349b15a2fa6f4abd0", "python_version": "3.8.16", "system": "Linux", "machine": "x86_64", "platform": "Linux-5.4.0-107-generic-x86_64-with-glibc2.2.5", "dbt_version": "1.4.1", "dbt_adapter_type": "impala", "dbt_adapter_version": "1.4.0", "dbt_deployment_env": {}, "project_name": "monitoring", "target_name": "impala", "no_of_threads": 3, "warehouse_version": {"version": "impalad version 3.4.0-SNAPSHOT", "build": "impalad version 3.4.0-SNAPSHOT RELEASE (build 75494b221421452d0b928d844e04afa70654d528)\\nBuilt on Wed Feb  8 15:18:37 UTC 2023"}}'}
�[0m10:27:32.870371 [debug] [Thread-145]: Writing runtime sql for node "model.econda.a_econda_basis_v2"
�[0m10:27:32.871312 [debug] [Thread-145]: Impala adapter: NotImplemented: add_begin_query
�[0m10:27:32.871649 [debug] [Thread-145]: Using impala connection "model.econda.a_econda_basis_v2"
�[0m10:27:32.871863 [debug] [Thread-145]: Tracker adapter: Usage tracking flag True. To turn on/off use usage_tracking flag in profiles.yml
�[0m10:27:32.872804 [debug] [Thread-157]: Tracker adapter: Sending Event {'data': '{"event_type": "start_query", "profile_name": "monitoring", "app": "dbt", "dbt_version": "1.4.1", "target_name": "impala", "model_name": "model.econda.a_econda_basis_v2", "sql_type": "create table", "auth": "N/A", "connection_state": "N/A", "elapsed_time": "N/A", "incremental_strategy": "N/A", "model_type": "N/A", "permissions": "N/A", "id": "6f580870-1459-4081-a126-fd1ed5cbeb18", "unique_host_hash": "6684ca3f30d055ca36911ca432eb2347", "unique_user_hash": "68c4283db8074b12df1660b31c0220a9", "unique_session_hash": "60313cc0204ea2f349b15a2fa6f4abd0", "python_version": "3.8.16", "system": "Linux", "machine": "x86_64", "platform": "Linux-5.4.0-107-generic-x86_64-with-glibc2.2.5", "dbt_adapter_type": "impala", "dbt_adapter_version": "1.4.0", "dbt_deployment_env": {}, "project_name": "monitoring", "no_of_threads": 3, "warehouse_version": {"version": "impalad version 3.4.0-SNAPSHOT", "build": "impalad version 3.4.0-SNAPSHOT RELEASE (build 75494b221421452d0b928d844e04afa70654d528)\\nBuilt on Wed Feb  8 15:18:37 UTC 2023"}}'}
�[0m10:27:32.873058 [debug] [Thread-145]: On model.econda.a_econda_basis_v2: /* {"app": "dbt", "dbt_version": "1.4.1", "profile_name": "monitoring", "target_name": "impala", "node_id": "model.econda.a_econda_basis_v2"} */

  create  table
    econda.a_econda_basis_v2
      partitioned by (sourcemonth)
....(sql script)

Unfortunately I'm not able to append the data to an existing table, even when I play around with the config options e.g. changing the materialized to incremental in the dbt_project.yml or inside the model config.
After posting in the slack channel I was routed to this place and hope to find help here.
Thanks.

@pmheld
Copy link
Author

pmheld commented Sep 7, 2023

to whom it may concern: @cloudera-hudson @myloginid @shreelola @vamshikolanu

I found out what the problem is and I would consider it a bug:
in the project.yml the following is specified:
models:
econda:
+materialized: table
+schema: econda

In the profiles.yml another user is connecting to impala having all rights on schema econda.

When running the incremental model the first time, the table is created fine. When running subsequently it is failing because it wants to create the table again rather than doing an insert as configured in the models config.

In incremental mode the program is somehow trying to identify the table in the schema of the user that is logged in and not in the schema specified in the project.yml

I tested the behavior by eliminating the "+schema: econda" part from the project.yml and it was working as expected in the schema of the user that is logged in. I want to be able though to use the schema that is not equal to the users schema that I'm logged in with.

BTW I'm using custom schema macro as described at https://docs.getdbt.com/docs/build/custom-schemas

Your feedback is very much appreciated.

@niteshy
Copy link
Collaborator

niteshy commented Sep 12, 2023

Thanks @pmheld for further information about the issue, We will update the issue soon and provide the fix.

@vamshikolanu
Copy link
Collaborator

Hello @pmheld. Can you please share the complete debug logs with us? We are unable to reproduce the issue on our end. We have tested our code with this example to reproduce the scenario you mentioned, and it works fine.

Please notice: https://github.com/cloudera/dbt-impala-example/blob/main/dbt_impala_demo/dbt_project.yml#L50 has a custom schema defined too.

I wonder if show tables isn't working in your case.

@pmheld
Copy link
Author

pmheld commented Sep 19, 2023 via email

@vamshikolanu
Copy link
Collaborator

Hello @pmheld, I can't find logs here. Did you forget to attach logs?

@pmheld
Copy link
Author

pmheld commented Sep 20, 2023

Hi @vamshikolanu, sorry, I attached the file in the reply but obviously it doesn't work.
You can find it here

@pmheld
Copy link
Author

pmheld commented Oct 30, 2023 via email

@pmheld
Copy link
Author

pmheld commented Nov 6, 2023

@cloudera-hudson @myloginid @shreelola @vamshikolanu
I've provided the log in a link to my github in the post of sep 20. Were you able to check it out yet?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants