-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions regarding zero-copy #187
Comments
Thanks for your great questions!
Regarding the last question, I would like to share some thoughts as well. The main idea is to use Arrow as the in-memory data type. Currently, I am researching two ways to achieve this:
I haven't decide which way is better. Welcome to discuss with us. |
Thanks, that is very helpful. Great to hear that the points are known and being improved. I don't have much insights in the internals of chDB. About importing data efficiently: ClickHouse supports lots of input/output format, e.g. |
Just throwing in my $0.02 in case it's of any use. Regarding this part of the discussion:
Perhaps one option is to use Arrow's C data interface or C stream interface. These allow Arrow buffers to be shared across language boundaries in a zero-copy manner within a single process. If I understand correctly, this is how some engines like Polars and DuckDB already handle querying in-memory Arrow tables today. I don't know anything about the internals of ClickHouse, but maybe this approach could make it easier/cleaner to implement a custom "storage type" as you mention. And vice versa, the C stream interface might help with exposing results as a stream of Arrow record batches. |
Thanks for your great advice. I'm researching on it. |
Hello,
I browsed the source code of chDB today (having read DuckDBs papers first) and had two questions:
As of today, chDB takes over the final query result (which is presumably refcounted) for further processing (see LocalServer.cpp).
Is it planned (or even possible) to have a more fine-granular mechanism for handing over results based on the data chunks used internally for query processing? In DuckDB, the application can fetch individual result chunks by triggering
pull()
on the execution plan.Likewise, I did not find a zero-copy mechanism for source data, meaning that right now the host process must first write the data to process to a local file and then let the embedded ClickHouse load it via the file engine. Did I miss something?
Thanks!
The text was updated successfully, but these errors were encountered: