Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Issues with datacontract-cli v0.10.14 and BigQuery #498

Open
ckiefer opened this issue Nov 10, 2024 · 3 comments
Open

Multiple Issues with datacontract-cli v0.10.14 and BigQuery #498

ckiefer opened this issue Nov 10, 2024 · 3 comments

Comments

@ckiefer
Copy link

ckiefer commented Nov 10, 2024

Thanks a lot for your work and the datacontract-cli tool. We are evaluating it for the use in our data platform developed in GCP. I am reporting multiple findings / issues when working with BigQuery tables and data contracts using version 0.10.14 of the datacontract-cli tool.

BigQuery INSERT INTO statement

The INSERT INTO sql statement provided in https://github.com/datacontract/datacontract-cli/blob/main/tests/fixtures/bigquery/export/bq.txt had some issues

The correct statement is
INSERT INTO "mgb-s-tst-jet-dev.datacontract_cli_test_dataset.BQ_Example" (string_field, required_string_field, maxlength_string_field, maxlength_required_string_field, varchar_field, text_field, bytes_field, int_field, integer_field, long_field, bigint_field, float_field, boolean_field, timestamp_field, timestamp_tz_field, timestamp_ntz_field, date_field, number_field, decimal_field, numeric_field, double_field, null_field, object_field, record_field, struct_field, string_array_field, int_array_field, complex_array_field) VALUES ("sample string", "required string", "sample maxlength string", "required maxlength string", "sample varchar", "sample text", FROM_BASE64("Ynl0ZXMgZGF0YQ=="), 123, 456, 789012345678, 987654321, 123.45, true, "2023-05-26T12:00:00Z", "2023-05-26T12:00:00Z", "12:00:00", "2023-05-26", 12.345, 12.345, 12.345, 12.345, "sample null value", STRUCT("required subfield", "optional subfield"), STRUCT(true, DATE "2023-05-26"), STRUCT(FROM_BASE64("Ynl0ZXMgZGF0YQ=="), 123), ["sample string 1"], [123], [STRUCT(true, BIGNUMERIC "12.345", ["123"])]) , ("another sample", "another required string", "another sample maxlength string", "another required maxlength string", "another sample varchar", "another sample text", FROM_BASE64("YW5vdGhlciBieXRlcyBkYXRh"), 789, 1011, 121314151617, 1617181920, 678.90, false, "2024-05-26T12:00:00Z", "2024-05-26T12:00:00Z", "13:00:00", "2024-05-26", 67.890, 67.890, 67.890, 67.890, "another null value", STRUCT("another required subfield", "another optional subfield"), STRUCT(false, DATE "2024-05-26"), STRUCT(FROM_BASE64("YW5vdGhlciBieXRlcyBkYXRh"), 456), ["sample string 2"], [456], [STRUCT(false, BIGNUMERIC "67.890", ["456"])]);

Unexpected Test Outputs

Given the BigQuery table bq_table_schema_2.json with schema schema_fields_2.json and this contract datacontract_2.yaml.txt, the following test outputs are shown:
Screenshot 2024-11-10 090947

It's unclear, why, for instance, the expected type of field maxlength_string_field is STRING and not STRING(42) as defined in the BQ table and the contract.

Field-level Quality Tests

The contract defines two field-level quality tests:
Screenshot 2024-11-10 091837

Only the outputs from the first test are printed in the terminal:
image

Let's suppose that it fails:
image

Then, the final output has not description, only 1):
image

Model-level Quality Tests

Also, in a slightly modified example, we have added to model-level quality tests to the data contract:
image

Only the first quality test is executed, and the final output has no description (same as in field-level quality testing):
image

Export to ODCS_v3

The model-level quality checks are not exported to odcs_v3 format; see datacontract_2_odcs_v3.yaml.txt

@jochenchrist
Copy link
Contributor

Hi @ckiefer,

Thanks for reporting these issues.
I think these are all valid issues that we could improve or fix.

Re: Unexpected Test Outputs

Workaround:

You can use a config object to specify the physical type, e.g. bigqueryType

    fields:
      my_field_1:
        type: string
        config:
          bigqueryType: STRING(42)

Details: https://datacontract.com/#config-object

The other issues

We would appreciate PRs that address these issues, if you have the possibility to do so. That certainly would improve the speed to get the issues resolved.

@stefannegele
Copy link
Contributor

@ckiefer I can support you on that!

@ckiefer
Copy link
Author

ckiefer commented Nov 19, 2024

Thank you for the responses. I'll discuss with the team to see if we have the capacity to actively work on a PR for datacontract-cli in our next PI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants