Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not supported range scans? #4

Open
ShuaiJunlan opened this issue Sep 25, 2018 · 5 comments
Open

Why not supported range scans? #4

ShuaiJunlan opened this issue Sep 25, 2018 · 5 comments
Labels
question Need more discussion

Comments

@ShuaiJunlan
Copy link

ShuaiJunlan commented Sep 25, 2018

Range scans is not supported for HaloDB. If I want to support range scans function, what should I do?

@ahasani
Copy link

ahasani commented Sep 25, 2018

You need to implement some sort of key ordering

@ShuaiJunlan
Copy link
Author

ShuaiJunlan commented Sep 25, 2018

@ahasani Is there a more detailed document about HaloDB designing?

@ahasani
Copy link

ahasani commented Sep 25, 2018

@ahasani Is there a more detailed document about HaloDB designing?

its on the readme doc mate, direct link is here.

IMHO in your particular interest is this paragraph excerpt :
"LSM tree and B-Tree also maintain an ordering of keys to support efficient range scans, but the cost they pay is a read amplification greater than 1, and for LSM tree, very high write amplification. Since our workload only does point lookups, we don’t want to pay the cost associated with storing data in a format suitable for range scans."

Cheers

@amannaly
Copy link
Collaborator

@ShuaiJunlan range scans are currently not supported. The workload for which HaloDB was designed for only does point lookups.

I haven't given this problem much thought yet, but a possible approach could be to use an ordered index in memory. We should ideally not order the data on disk as this would increase read and write amplification for HaloDB, or might force us to do random writes.

HaloDB currently has an in-memory index, which is an off-heap concurrent hash table.
To support range scans we could use a ConcurrentSkipList as the index, but since HaloDB need to handle large data sets we need an implementation of the skip list which does memory allocation outside the heap.
More research is needed to figure out how this will perform and the resources it will consume.

This is probably a big effort, and I don't plan do this immediately.

@amannaly amannaly added the question Need more discussion label Sep 27, 2018
@ShuaiJunlan
Copy link
Author

@amannaly Thank you for your guidance. Is there a more detailed document about HaloDB designing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Need more discussion
Projects
None yet
Development

No branches or pull requests

3 participants