Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Existing non-nullable column, not present in newly streamed schema #160

Open
AlexanderMann opened this issue Oct 14, 2019 · 2 comments
Open

Comments

@AlexanderMann
Copy link
Collaborator

Motivation

When we remove a column from a schema definition, it implies that the column is nullable. At present, target-postgres does not see absence of properties to mean that a column should be made nullable.

Ex

2019-10-14 01:45:16,259 src.load.bulk.target_postgres {} INFO: Root table name jobs
2019-10-14 01:45:16,275 src.load.bulk.target_postgres {} INFO: Writing batch with 107 records for `jobs` with `key_properties`: `['id', 'source']`
2019-10-14 01:45:16,291 src.load.bulk.target_postgres {} INFO: Writing table batch schema for `('jobs',)`...
2019-10-14 01:45:18,341 src.load.bulk.target_postgres {} WARNING: NOT EMPTY: Forcing new column `('location__raw_id',)` in table `jobs` to be nullable due to table not empty.
2019-10-14 01:45:18,395 src.load.bulk.target_postgres {} INFO: Table Schema Change [`jobs`.`('location__raw_id',)`:`location__raw_id`] New column, non empty table (took 53 millis)
2019-10-14 01:45:18,418 src.load.bulk.target_postgres {} INFO: METRIC: {"type": "timer", "metric": "job_duration", "value": 2.114116907119751, "tags": {"job_type": "upsert_table_schema", "path": ["jobs"], "database": "bog", "schema": "raw__singer", "table": "jobs", "status": "succeeded"}}
2019-10-14 01:45:18,420 src.load.bulk.target_postgres {} INFO: Writing table batch with 107 rows for `('jobs',)`...
2019-10-14 01:45:18,622 src.load.bulk.target_postgres {} INFO: METRIC: {"type": "counter", "metric": "record_count", "value": 0, "tags": {"count_type": "table_rows_persisted", "path": ["jobs"], "database": "bog", "schema": "raw__singer", "table": "jobs"}}
2019-10-14 01:45:18,623 src.load.bulk.target_postgres {} INFO: METRIC: {"type": "timer", "metric": "job_duration", "value": 2.334609031677246, "tags": {"job_type": "table", "path": ["jobs"], "database": "bog", "schema": "raw__singer", "table": "jobs", "status": "failed"}}
2019-10-14 01:45:18,625 src.load.bulk.target_postgres {} INFO: METRIC: {"type": "counter", "metric": "record_count", "value": 0, "tags": {"count_type": "batch_rows_persisted", "path": ["jobs"], "database": "bog", "schema": "raw__singer"}}
2019-10-14 01:45:18,626 src.load.bulk.target_postgres {} INFO: METRIC: {"type": "timer", "metric": "job_duration", "value": 2.3506481647491455, "tags": {"job_type": "batch", "path": ["jobs"], "database": "bog", "schema": "raw__singer", "status": "failed"}}
2019-10-14 01:45:18,641 src.load.bulk.target_postgres {} ERROR: Exception writing records
2019-10-14 01:45:18,657 src.load.bulk.target_postgres {} ERROR: Traceback (most recent call last):
2019-10-14 01:45:18,660 src.load.bulk.target_postgres {} ERROR:   File "/code/venv/target-postgres/lib/python3.7/site-packages/target_postgres/postgres.py", line 303, in write_batch
2019-10-14 01:45:18,662 src.load.bulk.target_postgres {} ERROR:     {'version': target_table_version})
2019-10-14 01:45:18,665 src.load.bulk.target_postgres {} ERROR:   File "/code/venv/target-postgres/lib/python3.7/site-packages/target_postgres/sql_base.py", line 847, in write_batch_helper
2019-10-14 01:45:18,667 src.load.bulk.target_postgres {} ERROR:     metadata)
2019-10-14 01:45:18,669 src.load.bulk.target_postgres {} ERROR:   File "/code/venv/target-postgres/lib/python3.7/site-packages/target_postgres/postgres.py", line 616, in write_table_batch
2019-10-14 01:45:18,672 src.load.bulk.target_postgres {} ERROR:     csv_rows)
2019-10-14 01:45:18,674 src.load.bulk.target_postgres {} ERROR:   File "/code/venv/target-postgres/lib/python3.7/site-packages/target_postgres/postgres.py", line 566, in persist_csv_rows
2019-10-14 01:45:18,676 src.load.bulk.target_postgres {} ERROR:     cur.copy_expert(copy, csv_rows)
2019-10-14 01:45:18,679 src.load.bulk.target_postgres {} ERROR: psycopg2.errors.NotNullViolation: null value in column "location" violates not-null constraint
2019-10-14 01:45:18,682 src.load.bulk.target_postgres {} ERROR: DETAIL:  Failing row contains (3f61fc1debcf5fddbf4966730369104cda9ded40,  ...

From the above, we can see that our new column location__raw_id is being created and made nullable, but our existing column, location is retaining it's NOT NULL constraint, ultimately causing the load to fail.

Proposal

Make the upsert logic handle absence of a key in the Singer Schema as making the corresponding column nullable.

@AlexanderMann
Copy link
Collaborator Author

@awm33 can you remember if there was any reason we allowed/wanted this? I'm just now running into this as a nuisance...

@cdilga
Copy link

cdilga commented Nov 14, 2019

Even running this for the first time causes this issue with a config of:

{
  "postgres_host": "host.docker.internal",
  "postgres_port": 54320,
  "postgres_database": "postgres",
  "postgres_username": "user",
  "postgres_password": "pass",
  "postgres_schema": "public"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants