Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add out_of_range flag to load_cdf #3040

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

pblocz
Copy link

@pblocz pblocz commented Dec 1, 2024

Description

This PR changes load_cdf so:

By default

  • load_cdf with starting_version out of range returns an InvalidVersion
  • load_cdf with a starting_timestamp out of range returns a ChangeDataTimestampGreaterThanCommit

And you can use .with_allow_out_of_range() in rust or allow_out_of_range=True in python to make it so that both cases return an empty dataframe.

This aligns it closer with how reading the change feed works in spark and makes the behaviour consistent.

This in contrast to the current behavior:

  • load_cdf with starting_version out of range returns an InvalidVersion
  • load_cdf with a starting_timestamp out of range returns an empty dataframe

Related Issue(s)

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Dec 1, 2024
Copy link

github-actions bot commented Dec 1, 2024

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

Copy link

codecov bot commented Dec 1, 2024

Codecov Report

Attention: Patch coverage is 81.60000% with 23 lines in your changes missing coverage. Please review.

Project coverage is 72.72%. Comparing base (ba671c9) to head (b96b93b).

Files with missing lines Patch % Lines
crates/core/src/operations/load_cdf.rs 84.29% 3 Missing and 16 partials ⚠️
python/src/lib.rs 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3040      +/-   ##
==========================================
+ Coverage   72.68%   72.72%   +0.03%     
==========================================
  Files         129      129              
  Lines       41462    41578     +116     
  Branches    41462    41578     +116     
==========================================
+ Hits        30137    30237     +100     
  Misses       9376     9376              
- Partials     1949     1965      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just couple comments

python/tests/test_cdf.py Outdated Show resolved Hide resolved
python/tests/test_cdf.py Outdated Show resolved Hide resolved
@ion-elgreco ion-elgreco changed the title add out_of_range flag to load_cdf feat: add out_of_range flag to load_cdf Dec 1, 2024
@rtyler rtyler added this to the v0.23 milestone Dec 1, 2024
Copy link
Collaborator

@hntd187 hntd187 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now, just rerunning CI right now local concurrency test is notoriously flakey.

@pblocz
Copy link
Author

pblocz commented Dec 1, 2024

Awesome, seems that all the tests passed now

@hntd187
Copy link
Collaborator

hntd187 commented Dec 2, 2024

@pblocz if you could sign off on your commits we can get this merged soon

@pblocz
Copy link
Author

pblocz commented Dec 2, 2024

@hntd187 Done. I don't know why, but thought the sign off was something else 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

load_cdf return an empty dataframe when a version is out of range
4 participants