Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addresses #29: Support missing version-hint.txt and provide additional options #63

Merged
merged 8 commits into from
Aug 13, 2024

Conversation

teaguesterling
Copy link
Contributor

@teaguesterling teaguesterling commented Aug 4, 2024

This intends to address #29 by providing some additional options when finding metadata files to use.

This PR essentially adds two parameters to the iceberg functions:

  • version - Either a version number of the table to use, or a "hint" filename that should be read to get the latest version.
    Default value: 'version-hint.text' (preserves the previous behavior of looking for a version-hint.text file
    Examples: version='12': Explicitly load table version 12. version='my-custom-version-file.txt': Use version from my-custom-file.txt instead of version-hint.text
  • version_name_format - comma-delimited list of format strings that should be tried to find the metadata file name based on the supplied or hinted version.
    Default value: 'v%s%s.metadata.json,%s%s.metadata.json (extends the previous behavior by also checking for a version file without the prefixed "v" in the event the default behavior failed).
    Examples: 'version-%smetadata.json%s' would look for version-42.metadata.json.gz when specified with (..., version='42', medata_compression_codec='gzip')

As was suggested by @lamb-russell, this PR will (eventually) do the following:

  • Replace the hard coded "v" prefix being added to the version in "version-hint.txt" with a more flexible printf-style template string that can be specified as a parameter.
  • Rework in the metadata functions to explicitly pass the right metadata file version around instead of loading it deep in the stack.
  • Capture the appropriate version id earlier from "version-hint.txt"
  • Allow a specific version id to be passed in to the iceberg functions (disabling the use of "version-hint.txt")
  • Allow a specific version formatter to be passed into iceberg functions
  • Update test cases
  • Update documentation
  • Possibly provide some additional helper functions to find list available versions to make using the id and formatter parameters easier.

Signed-off-by: Teague Sterling <[email protected]>
Signed-off-by: Teague Sterling <[email protected]>
Signed-off-by: Teague Sterling <[email protected]>
Signed-off-by: Teague Sterling <[email protected]>
@teaguesterling teaguesterling marked this pull request as ready for review August 10, 2024 23:46
Signed-off-by: Teague Sterling <[email protected]>
Copy link
Collaborator

@samansmink samansmink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@samansmink samansmink merged commit 3f6d753 into duckdb:main Aug 13, 2024
16 checks passed
mike-luabase pushed a commit to definite-app/duckdb_iceberg that referenced this pull request Oct 27, 2024
Addresses duckdb#29: Support missing version-hint.txt and provide additional options
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants