-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support duckdb as alternate sink/source...or more #482
Comments
Or perhaps this is just about Parquet support. |
I think these are separate asks. Duckdb is very flexible in what it reads, it can run directly off parquet, arrow, sqlite3, and plain csv (locally or remotely). It's kind of awesome. But there may still be a use case for having a native duckdb file. It may be worthwhile taking a step back and talking about internal representation a moment. The biolink standard defines slots like See https://github.com/orgs/linkml/discussions/1996 This becomes even more relevant when we think about distributing closures with the kgx. I think this should be the default, and we should adopt the de-facto monarch kgx standard devised by @kevinschaper adapted from @kltm's golr. Basically And modern tabular formats have better support for nested datamodels too, which could provide at last a solution to #218. duckdb supports both structure objects, as well as a json type, similar to pg's jsonb. Of course these can all be serialized as csv by using standard pipe delimiting but this should be seen as the LCD serialization for CSV only and give modern data scientists something better. |
@caufieldjh, I noticed you ended up writing that merge code in kg-microbe-merge. Merging the Monarch ingests inevitably hits swap hard-- did you see a lot of memory savings there? |
Ah, Harshad wrote that. I seem to recall that it did save on memory but not massively. |
In KG construction call today (Apr 1 2024), discussion touched on DuckDB and its relevance to KG exchange.
At minimum, supporting this infrastructure could make it easier to query and access graphs.
Beyond that, this DB platform or its alternatives could replace some internal KGX operations, particularly the more memory-hungry ones like merge.
The text was updated successfully, but these errors were encountered: