Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36099: [C++] Add Utf8View and BinaryView to the c ABI #38443

Merged
merged 4 commits into from
Nov 29, 2023

Conversation

bkietz
Copy link
Member

@bkietz bkietz commented Oct 24, 2023

Rationale for this change

Utf8View and BinaryView should be added to the c ABI spec and to the c++ library's importer/exporter.

Are these changes tested?

Yes, minimally

Are there any user-facing changes?

View arrays will be importable/exportable through the c ABI in c++

Copy link
Member Author

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: based on #37792

cpp/src/arrow/c/bridge_test.cc Show resolved Hide resolved
@bkietz bkietz requested review from felipecrv and pitrou October 24, 2023 20:34
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Oct 24, 2023
@felipecrv
Copy link
Contributor

LGTM. But of course we need to wait for the separate review and merge of the string view PR before merging this.

@pitrou
Copy link
Member

pitrou commented Oct 25, 2023

You also need to remove the corresponding skips in datagen.py.

@bkietz bkietz marked this pull request as draft October 25, 2023 14:56
@bkietz
Copy link
Member Author

bkietz commented Oct 25, 2023

The largest issue I had forgotten: C-ABI doesn't store buffer sizes explicitly. (For string data we derive the data buffer's length by reading the offsets buffer.) I guess we'll need an extra buffer to store those, since reconstructing the sizes by examining every view is untenable

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 25, 2023
@bkietz bkietz force-pushed the 36099-string-view-c-abi branch from 0afb739 to 2d13898 Compare November 1, 2023 18:30
@bkietz bkietz marked this pull request as ready for review November 2, 2023 16:49
@@ -3986,8 +4072,6 @@ TEST_F(TestDeviceArrayRoundtrip, Primitive) {
TestWithJSON(mm, int32(), "[4, 5, null]");
}

// TODO C -> C++ -> C roundtripping tests?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be removed now since the C abi integration tests should cover this

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Nov 9, 2023
@bkietz
Copy link
Member Author

bkietz commented Nov 28, 2023

ping @pitrou

@felipecrv felipecrv removed their request for review November 28, 2023 18:28
Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few nitpicks / comments

Comment on lines -263 to +266
dict_exporter_.reset(new SchemaExporter());
dict_exporter_ = std::make_unique<SchemaExporter>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've actually wondered.... is there any particular difference (performance or otherwise) doing .reset(new T) vs = std::make_unique<T>()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In scenarios where you have an std::unique_ptr<ParentClass> being populated by many sub-classes of ParentClass:

std::make_unique<T, Deleter>(...) can lead to more binary bloat because class Deleter = std::default_delete<T> is type-specialized whereas reset() takes a super-class pointer and only needs the deleter that invokes the destructor via dynamic dispatching through the parent class vtable.

Being type-specialized often leads to the derived-classes destructors' being inlined into the std::default_deleter<T>. All this extra inlining and removal of one indirection could lead to less overhead of destructor dispatching.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, wouldn't it be better to keep this as calling .reset rather than switch to make_unique?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case there is no difference since the assigned pointer is a unique_ptr<SchemaExporter> and not a pointer to a base/parent class.

cpp/src/arrow/c/bridge.cc Outdated Show resolved Hide resolved
cpp/src/arrow/c/bridge.cc Show resolved Hide resolved
cpp/src/arrow/c/bridge_test.cc Outdated Show resolved Hide resolved
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I see no array roundtrip test, am I missing something?
  2. Can you rebase to get the latest changes (including list view integration)?

cpp/src/arrow/c/bridge_test.cc Outdated Show resolved Hide resolved
@bkietz bkietz force-pushed the 36099-string-view-c-abi branch from 0b73842 to df068c2 Compare November 29, 2023 16:01
@bkietz
Copy link
Member Author

bkietz commented Nov 29, 2023

@pitrou rebased, added round trip tests

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 29, 2023
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I think there are lint errors to fix

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Nov 29, 2023
@bkietz bkietz merged commit 94fc124 into apache:main Nov 29, 2023
35 of 37 checks passed
@bkietz bkietz removed the awaiting changes Awaiting changes label Nov 29, 2023
@bkietz bkietz deleted the 36099-string-view-c-abi branch November 29, 2023 17:22
@felipecrv
Copy link
Contributor

Yay! 🚀

Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 94fc124.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

@urvishdesai
Copy link

urvishdesai commented Dec 1, 2023

Hi @bkietz, is there a plan to add Utf8View support to the rust arrow2 library soon?

If someone is not already working on it, I would like to take up the PR to add support for Utf8View to the arrow2 library (jorgecarleitao/arrow2#1596)

@bkietz
Copy link
Member Author

bkietz commented Dec 1, 2023

@urvishdesai thanks for offering! I don't think anyone else has that queued, please do (and I'd be happy to review)

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…e#38443)

### Rationale for this change

Utf8View and BinaryView should be added to the c ABI spec and to the c++ library's importer/exporter.

### Are these changes tested?

Yes, minimally

### Are there any user-facing changes?

View arrays will be importable/exportable through the c ABI in c++

* Closes: apache#36099

Authored-by: Benjamin Kietzman <[email protected]>
Signed-off-by: Benjamin Kietzman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add string view to the C ABI
5 participants