You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Because we want to be able to do some stateful transformations, like joins, counts or aggregations we have to chose a local, ephemeral database.
Normally a kafka consumer takes care of one or more partitions. Partitions are not shared among consumers. So we can take advantage of this and place a database in the consumer.
Other solutions like faust make use of rocksdb, which is kind of the best contender in the space. In practice, installing rocksdb creates all kind of problems for developers which hinder the DX of the platform. Flink has support for a rocksdb backend and a hashmap (in memory) backend.
I think this is a great opportunity to test different backends. Points to consider:
Performance
Ease of use
Investigation
Personally, I would like to have some benchmarks and test sled. The challenges we face is that there are no python bindings yet. But I think it would make a nice experiment to build the bindings and compare the performance with rocksdb. The benefit of sled is that it uses LSM tree-like write performance with traditional B+ tree-like read performance while rocksdb uses LSM tree.
Goal
Keep a local state for aggregations
Problem description
Because we want to be able to do some stateful transformations, like joins, counts or aggregations we have to chose a local, ephemeral database.
Normally a kafka consumer takes care of one or more partitions. Partitions are not shared among consumers. So we can take advantage of this and place a database in the consumer.
Other solutions like faust make use of rocksdb, which is kind of the best contender in the space. In practice, installing rocksdb creates all kind of problems for developers which hinder the DX of the platform. Flink has support for a rocksdb backend and a hashmap (in memory) backend.
I think this is a great opportunity to test different backends. Points to consider:
Investigation
Personally, I would like to have some benchmarks and test sled. The challenges we face is that there are no python bindings yet. But I think it would make a nice experiment to build the bindings and compare the performance with rocksdb. The benefit of sled is that it uses LSM tree-like write performance with traditional B+ tree-like read performance while rocksdb uses LSM tree.
Other alternatives to consider:
Questions
Important
It's important to create a good interface (Protocol in python), so it becomes easy to support multiple backends.
The text was updated successfully, but these errors were encountered: