-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Extension Types] Add support for cross-lang extension types. #899
Conversation
ede88bf
to
6011c52
Compare
@@ -5,8 +5,11 @@ prettytable-rs = "^0.10" | |||
rand = "^0.8" | |||
|
|||
[dependencies.arrow2] | |||
branch = "clark/expand-casting-support" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we merge it into main in our fork before merging this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main con with working off of main is that we'd be making the divergence from upstream Arrow2 implicit, and if we keep merging upstream Arrow2 main
into our fork's main
, our differing commits might be lost in the git history. Instead, I think it would be better to anchor to a branch where all of the diverging commits are at the head and try to keep the branching small/short-lived.
The flow could be:
- Create a branch on Arrow2 fork containing requisite changes to Arrow2.
- Update Daft to point to that branch.
- Submit a PR from Arrow2 fork branch to upstream Arrow2.
- If the PR is updated during the review process, we can update the locked Daft dependency with a
cargo update
PR. - When the PR is merged and is included in an Arrow2 release, switch Daft's Arrow2 dependency to point to that crates.io release.
- If we need another Arrow2 change stacked on the existing Arrow2 branch/PR, we could create a nightly snapshot branch containing both changes, similar to what Polars does: https://github.com/pola-rs/polars/blob/528590cfa57e48f5bd902ad027f5e35318644110/Cargo.toml#L47
I think this should help keep the difference with upstream Arrow2 explicit, which should be nicer? Let me know what you think about that flow.
59b18a7
to
262b9cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine to me, but might require changes after the logical types refactor?
8e80005
to
9bf180a
Compare
// TODO(Clark): Refactor to GILOnceCell in order to avoid deadlock between the below mutex and the Python GIL. | ||
lazy_static! { | ||
static ref REGISTRY: Mutex<HashMap<std::string::String, arrow2::datatypes::DataType>> = | ||
Mutex::new(HashMap::new()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I can see how this could lead to a deadlock with the GIL. Could be worthwhile to Fix with the GILOnceCell.
88ddc40
to
b46c42a
Compare
b46c42a
to
7cd2205
Compare
d5c6ea8
to
efaab36
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #899 +/- ##
==========================================
- Coverage 86.37% 86.18% -0.20%
==========================================
Files 178 178
Lines 14415 14538 +123
==========================================
+ Hits 12451 12529 +78
- Misses 1964 2009 +45
|
This PR adds support for cross-language extension types, i.e. extension types that use a language-agnostic serialization method.
TODOs