Move MemoryOverflowModel and SailSourceModel to rdf4j-sail-base #4555

kenwenzel · 2023-05-16T13:42:57Z

Problem description

Both, NativeStore and LmdbStore, use the classes MemoryOverflowModel and SailSourceModel with own implementations.

Preferred solution

Move MemoryOverflowModel and SailSourceModel to rdf4j-sail-base.

Are you interested in contributing a solution yourself?

None

Alternatives you've considered

No response

Anything else?

No response

The text was updated successfully, but these errors were encountered:

hmottestad · 2023-05-16T15:51:07Z

We should make sure that we don't break backwards compatibility.

abrokenjester · 2023-05-17T04:00:18Z

I personally think we should get rid of SailSourceModel entirely, as it has quite poor performance, and replace with a MapDB-backed model implementation.

kenwenzel · 2023-05-17T07:11:50Z

I am not sure if MapDB will help here as we need multiple triple indexes and also some kind of value store. In the end we will be forced to implement something like NativeStore or LmdbStore.

hmottestad · 2023-05-17T07:55:13Z

I'm not sure if we need all that much safety wise with the MemoryOverflowModel. We could overflow to a n-quads file. The ShaclSail wouldn't enjoy the performance hit though, since it would have to query the data in the current transaction which would mean reading from the n-quads file.

kenwenzel · 2023-05-17T08:13:51Z

But this would require a linear scan to match different triple patterns like (S, _, ), (, P, ) or (, _, O).
As overflow is only triggered for large transactions it should be expected that the n-quads file is also rather large.
We could also experiment with Parquet files that support faster scanning by skipping row groups.

hmottestad · 2023-05-17T14:14:25Z

True, which is why it would be bad for the ShaclSail. In general I would assume that users who load in large files don't typically run queries before committing. We could try something like the DynamicModel, where we use a very simple and fast overflow until the user runs queries at which point we would migrate the data to a more advanced data structure.

kenwenzel · 2023-05-17T14:45:20Z

@hmottestad I think we could improve the current situation:
#4557

Maybe you can help with the logic for switching to disk-based storage.

kenwenzel added the 📶 enhancement issue is a new feature or improvement label May 16, 2023

eclipsewebmaster added this to RDF4J Planning May 16, 2023

github-project-automation bot moved this to 🆕 Triage in RDF4J Planning May 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move MemoryOverflowModel and SailSourceModel to rdf4j-sail-base #4555

Move MemoryOverflowModel and SailSourceModel to rdf4j-sail-base #4555

kenwenzel commented May 16, 2023

hmottestad commented May 16, 2023

abrokenjester commented May 17, 2023

kenwenzel commented May 17, 2023

hmottestad commented May 17, 2023

kenwenzel commented May 17, 2023

hmottestad commented May 17, 2023 •

edited

Loading

kenwenzel commented May 17, 2023

Move MemoryOverflowModel and SailSourceModel to rdf4j-sail-base #4555

Move MemoryOverflowModel and SailSourceModel to rdf4j-sail-base #4555

Comments

kenwenzel commented May 16, 2023

Problem description

Preferred solution

Are you interested in contributing a solution yourself?

Alternatives you've considered

Anything else?

hmottestad commented May 16, 2023

abrokenjester commented May 17, 2023

kenwenzel commented May 17, 2023

hmottestad commented May 17, 2023

kenwenzel commented May 17, 2023

hmottestad commented May 17, 2023 • edited Loading

kenwenzel commented May 17, 2023

hmottestad commented May 17, 2023 •

edited

Loading