Keep local state #33

woile · 2022-08-02T07:40:43Z

Goal

Keep a local state for aggregations

Problem description

Because we want to be able to do some stateful transformations, like joins, counts or aggregations we have to chose a local, ephemeral database.
Normally a kafka consumer takes care of one or more partitions. Partitions are not shared among consumers. So we can take advantage of this and place a database in the consumer.

Other solutions like faust make use of rocksdb, which is kind of the best contender in the space. In practice, installing rocksdb creates all kind of problems for developers which hinder the DX of the platform. Flink has support for a rocksdb backend and a hashmap (in memory) backend.

I think this is a great opportunity to test different backends. Points to consider:

Performance
Ease of use

Investigation

Personally, I would like to have some benchmarks and test sled. The challenges we face is that there are no python bindings yet. But I think it would make a nice experiment to build the bindings and compare the performance with rocksdb. The benefit of sled is that it uses LSM tree-like write performance with traditional B+ tree-like read performance while rocksdb uses LSM tree.

Other alternatives to consider:

sqlitedict

Questions

How should we design the benchmark?

Important

It's important to create a good interface (Protocol in python), so it becomes easy to support multiple backends.

woile added the enhancement New feature or request label Aug 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep local state #33

Keep local state #33

woile commented Aug 2, 2022 •

edited

Loading

Keep local state #33

Keep local state #33

Comments

woile commented Aug 2, 2022 • edited Loading

Goal

Problem description

Investigation

Questions

Important

woile commented Aug 2, 2022 •

edited

Loading