Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add generated_data to repo to facilitate CI #59

Merged
merged 3 commits into from
Jul 1, 2024

Conversation

peterboncz
Copy link

@peterboncz peterboncz commented Jun 28, 2024

  • Commit the generated data into the repo, because this facilitates running the tests in CI
    (since generating the data requires to equip the test runners with pyspark, which at least motherduck prefers not to).

  • The data is minimized to 5MB, by setting the scale factor 10x smaller, and from expected_results/ only keeping last/.
    This motivates rewriting some testcases that used expected_results/q08 into expected_results/last (which is the same)

  • Removed one testcase, which tests if we get an error when parquet is not loaded (in MotherDuck, when we run iceberg in so-called "remote_only" mode, this will actually not raise an error).
    It was kind of a trivial test, so I hope this is OK..

…unning the tests in CI

  (since generating the data requires to equip the test runners with pyspark, which at least mother prefers not to)

- the data is minimized to 5MB, by setting the scale factor 10x smaller, and from expected_results/ only keeping last/
  this motivates rewriting some testcases that used expected_results/q08 into expected_results/last (which is the same)

- removed one testcase, which tests if we get an error when parquet is not loaded
  (in MotherDuck, when we run iceberg in so-called "remote_only" mode, this will actually not raise an error)
  it was kind of a trivial test, so I hope this is OK..
Copy link
Collaborator

@samansmink samansmink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! added 1 small comment

Makefile Outdated
@@ -9,11 +9,11 @@ include extension-ci-tools/makefiles/duckdb_extension.Makefile

# Custom makefile targets
data: data_clean
python3 scripts/test_data_generator/generate_iceberg.py 0.01 data/iceberg/generated_spec1_0_01 1
python3 scripts/test_data_generator/generate_iceberg.py 0.01 data/iceberg/generated_spec2_0_01 2
python3 scripts/test_data_generator/generate_iceberg.py 0.001 data/iceberg/generated_spec1_0_01 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the name of the directory is now no longer correct which could be a little confusing

@samansmink
Copy link
Collaborator

windows CI is failing due to some upstream issue, however the tests also seem to fail due to only the json metadata files being added and not the avro and parquet files

- also in Makefile and in the tests (name, and paths used)
@peterboncz
Copy link
Author

windows CI is failing due to some upstream issue, however the tests also seem to fail due to only the json metadata files being added and not the avro and parquet files

apologies for that messup. I have made the name change and made sure all files are pushed now.

@samansmink
Copy link
Collaborator

Thanks for the changes! looks good!

@samansmink samansmink merged commit 5b72b75 into duckdb:main Jul 1, 2024
16 checks passed
mike-luabase pushed a commit to definite-app/duckdb_iceberg that referenced this pull request Oct 27, 2024
add generated_data to repo to facilitate CI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants