Can we support versioning? #15

archenroot · 2018-12-19T07:43:11Z

I would like to support key value versioning capability.

There could be introduced something like revision id (incremental number per key 0-infinity) and at the same time I am interested in tracking key (validFrom/createdOn - validTo/InvalidatedOn).

ahasani · 2018-12-19T08:32:23Z

As with compound key, you can append version number to the key. Now here is the catch: to get the latest version you need to do prefix scan which HaloDB does not support or to iterate. HaloDB is not quite fast at iterating over value. HaloDB excels at random search.

archenroot · 2018-12-19T09:40:41Z

@ahasani I think I will look inside how to implement this capability within the existing API. On top there are these additional attributes:
validFrom - you can insert a key which is valid from different date then one of inserted into store
createdOn - mark of creation
validTo - value is not valid after this period of time
InvalidatedOn - equeal to createdOn of next revision or today (if just invalidated command is issued)

Ladislav

archenroot · 2018-12-19T10:02:27Z

@ahasani - what about full text search on VALUE part? I know its bit against standard usage, but in general show me all keys with which includes value. I think in such case I must implement this, correct? Its fine it will be slow. Maybe we can use Solr or Lucene engines for this.

ahasani · 2018-12-19T10:51:29Z

Hi @archenroot, on validFrom/createdOn - validTo/InvalidatedOn, I have asked @amannaly to implement timestamp on header as such that we can iterate over it, this is on validFrom/createdOn - validTo/InvalidatedOn as issue #9. On full text search you are spot on by using lucene not too hard too implement.
IMHO on HaloDB, you have to understand the very SPECIFIC use case of HaloDB

HaloDB was written for a high-throughput, low latency distributed key-value database that powers multiple ad platforms at Yahoo, therefore all its design choices and optimizations were primarily for this use case.

First think of Oath (formerly Yahoo) they have big boxes with a lots of memory so they design all in memory metadata, which already limit the use case for "commodity" hw or small cloud instance.
Second the design choice of strictly paging write for throughput allow data loss is tolerable since their implementation sit behind kafka as the 1st persistence and messaging layer.
Third they choose single threaded writer.
And forth they choose not to use sorting and range scan.

#2 is done by having durability option, #3 is easy to overcome with multiple instances or queue/disruptor (yes HaloDB is that fast) #4 we can have another layer of sortedindex. But #1 is hard as it is the "main" feature of HaloDB.

I have high respect for HaloDB and it is a very positive contribution from Oath and @amannaly for us and community. Kindly appreciated.

Also please have a look at RockDB java or H2 MVStore. But i am against using these kind of store for any big value (anything bigger than 1mb) for write/space amplification . Even putting these 2 and/or lucene complementing HaloDB as index could be a better choice, similar to wisckey / badger

Sorry for TL;DR :-) i love HaloDB such an inspiration.

Cheers

archenroot · 2018-12-19T11:42:53Z

@ahasani - thx for comprehensive answer! I look for gigabytes to be stored in my case, but if I understand it correctly HaloDB keeps in memory only indexes, correct? We don't have commodity hardware here, so having 512/1TB ram not an issue.

Regarding header timestamp I noticed that issue, thx for reference.

Regarding #2 and #3 and #4 - good reading, thx.

Regarding #1 I understand - design case.

Kafka usage, I see, so it will require to do something like this to expose HaloDB to others and with secured persistence:

Right? So each READ instance has its own copy.

NOTE: It could be also implemented as READ and WRITE services are one service and Kafka Embedded.

archenroot mentioned this issue Dec 19, 2018

Storage clustering support #18

Closed

wangtao724 added the question Need more discussion label Dec 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we support versioning? #15

Can we support versioning? #15

archenroot commented Dec 19, 2018

ahasani commented Dec 19, 2018

archenroot commented Dec 19, 2018

archenroot commented Dec 19, 2018

ahasani commented Dec 19, 2018

archenroot commented Dec 19, 2018 •

edited

Loading

Can we support versioning? #15

Can we support versioning? #15

Comments

archenroot commented Dec 19, 2018

ahasani commented Dec 19, 2018

archenroot commented Dec 19, 2018

archenroot commented Dec 19, 2018

ahasani commented Dec 19, 2018

archenroot commented Dec 19, 2018 • edited Loading

archenroot commented Dec 19, 2018 •

edited

Loading