Existing non-nullable column, not present in newly streamed schema #160

AlexanderMann · 2019-10-14T01:51:19Z

Motivation

When we remove a column from a schema definition, it implies that the column is nullable. At present, target-postgres does not see absence of properties to mean that a column should be made nullable.

Ex

2019-10-14 01:45:16,259 src.load.bulk.target_postgres {} INFO: Root table name jobs
2019-10-14 01:45:16,275 src.load.bulk.target_postgres {} INFO: Writing batch with 107 records for `jobs` with `key_properties`: `['id', 'source']`
2019-10-14 01:45:16,291 src.load.bulk.target_postgres {} INFO: Writing table batch schema for `('jobs',)`...
2019-10-14 01:45:18,341 src.load.bulk.target_postgres {} WARNING: NOT EMPTY: Forcing new column `('location__raw_id',)` in table `jobs` to be nullable due to table not empty.
2019-10-14 01:45:18,395 src.load.bulk.target_postgres {} INFO: Table Schema Change [`jobs`.`('location__raw_id',)`:`location__raw_id`] New column, non empty table (took 53 millis)
2019-10-14 01:45:18,418 src.load.bulk.target_postgres {} INFO: METRIC: {"type": "timer", "metric": "job_duration", "value": 2.114116907119751, "tags": {"job_type": "upsert_table_schema", "path": ["jobs"], "database": "bog", "schema": "raw__singer", "table": "jobs", "status": "succeeded"}}
2019-10-14 01:45:18,420 src.load.bulk.target_postgres {} INFO: Writing table batch with 107 rows for `('jobs',)`...
2019-10-14 01:45:18,622 src.load.bulk.target_postgres {} INFO: METRIC: {"type": "counter", "metric": "record_count", "value": 0, "tags": {"count_type": "table_rows_persisted", "path": ["jobs"], "database": "bog", "schema": "raw__singer", "table": "jobs"}}
2019-10-14 01:45:18,623 src.load.bulk.target_postgres {} INFO: METRIC: {"type": "timer", "metric": "job_duration", "value": 2.334609031677246, "tags": {"job_type": "table", "path": ["jobs"], "database": "bog", "schema": "raw__singer", "table": "jobs", "status": "failed"}}
2019-10-14 01:45:18,625 src.load.bulk.target_postgres {} INFO: METRIC: {"type": "counter", "metric": "record_count", "value": 0, "tags": {"count_type": "batch_rows_persisted", "path": ["jobs"], "database": "bog", "schema": "raw__singer"}}
2019-10-14 01:45:18,626 src.load.bulk.target_postgres {} INFO: METRIC: {"type": "timer", "metric": "job_duration", "value": 2.3506481647491455, "tags": {"job_type": "batch", "path": ["jobs"], "database": "bog", "schema": "raw__singer", "status": "failed"}}
2019-10-14 01:45:18,641 src.load.bulk.target_postgres {} ERROR: Exception writing records
2019-10-14 01:45:18,657 src.load.bulk.target_postgres {} ERROR: Traceback (most recent call last):
2019-10-14 01:45:18,660 src.load.bulk.target_postgres {} ERROR:   File "/code/venv/target-postgres/lib/python3.7/site-packages/target_postgres/postgres.py", line 303, in write_batch
2019-10-14 01:45:18,662 src.load.bulk.target_postgres {} ERROR:     {'version': target_table_version})
2019-10-14 01:45:18,665 src.load.bulk.target_postgres {} ERROR:   File "/code/venv/target-postgres/lib/python3.7/site-packages/target_postgres/sql_base.py", line 847, in write_batch_helper
2019-10-14 01:45:18,667 src.load.bulk.target_postgres {} ERROR:     metadata)
2019-10-14 01:45:18,669 src.load.bulk.target_postgres {} ERROR:   File "/code/venv/target-postgres/lib/python3.7/site-packages/target_postgres/postgres.py", line 616, in write_table_batch
2019-10-14 01:45:18,672 src.load.bulk.target_postgres {} ERROR:     csv_rows)
2019-10-14 01:45:18,674 src.load.bulk.target_postgres {} ERROR:   File "/code/venv/target-postgres/lib/python3.7/site-packages/target_postgres/postgres.py", line 566, in persist_csv_rows
2019-10-14 01:45:18,676 src.load.bulk.target_postgres {} ERROR:     cur.copy_expert(copy, csv_rows)
2019-10-14 01:45:18,679 src.load.bulk.target_postgres {} ERROR: psycopg2.errors.NotNullViolation: null value in column "location" violates not-null constraint
2019-10-14 01:45:18,682 src.load.bulk.target_postgres {} ERROR: DETAIL:  Failing row contains (3f61fc1debcf5fddbf4966730369104cda9ded40,  ...

From the above, we can see that our new column location__raw_id is being created and made nullable, but our existing column, location is retaining it's NOT NULL constraint, ultimately causing the load to fail.

Proposal

Make the upsert logic handle absence of a key in the Singer Schema as making the corresponding column nullable.

The text was updated successfully, but these errors were encountered:

AlexanderMann · 2019-10-14T01:52:08Z

@awm33 can you remember if there was any reason we allowed/wanted this? I'm just now running into this as a nuisance...

cdilga · 2019-11-14T00:56:47Z

Even running this for the first time causes this issue with a config of:

{
  "postgres_host": "host.docker.internal",
  "postgres_port": 54320,
  "postgres_database": "postgres",
  "postgres_username": "user",
  "postgres_password": "pass",
  "postgres_schema": "public"
}

AlexanderMann mentioned this issue Feb 12, 2020

Fix/missing field in schema implies nullable field #174

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Existing non-nullable column, not present in newly streamed schema #160

Existing non-nullable column, not present in newly streamed schema #160

AlexanderMann commented Oct 14, 2019

AlexanderMann commented Oct 14, 2019

cdilga commented Nov 14, 2019

Existing non-nullable column, not present in newly streamed schema #160

Existing non-nullable column, not present in newly streamed schema #160

Comments

AlexanderMann commented Oct 14, 2019

Motivation

Ex

Proposal

AlexanderMann commented Oct 14, 2019

cdilga commented Nov 14, 2019