Kafka Journal

Requirements:

Journal should be operational even when replication is offline, ideally up to topic retention period. Let's say 24h
Journal will operate on huge streams of events, it is impossible to fit all in memory

Stream data from two storage when one is not consistent and the second lost his tail.

You might naively think that kafka-journal stores events in Kafka. That is not really true. It stores actions.

Actions:

Append: Kafka record that contains list of events saved atomically
Mark: With help of Mark action we can prevent from consuming Kafka infinitely and stop upon marker found in topic
Delete: Indicates attempt to delete all existing events up to passed seqNr. It will not delete future events

Action.Mark pushed to the topic in parallel with querying Cassandra for processed topic offsets
Initiation query to Cassandra with streaming capabilities in parallel with initiating consumer from offsets just to make sure there are no deletions
Start streaming data from Cassandra filtering out deleted records
When finished reading from Cassandra might need to start consuming data from Kafka, in case replication is slow or does not happen at the moment

If Action.Mark's offset <= topic offsets from eventual storage, we don't need to consume Kafka at all (impossible best case scenario, hope dies last)
No need to wait for Action.Mark produce completed to start querying Cassandra
In case reading Kafka while looking up for Action.Delete, we can buffer some part of head to not initiate second read (most common scenario, replicating app is on track)
We don't need to initiate second consumer in case we managed to replicate all within reading duration
Use pool of consumers and cache either head or deletions
Subscribe for particular partition rather than the whole topic, however this limits caching capabilities
Don't deserialise unrelated Kafka records

We cannot stream data to client before making sure there are no Action.Delete in not yet replicated Kafka head

Measure three important operations of Kafka consumer: init, seek, poll. So we can optimise journal even better
More corner cases to come in order to support re-partition [＾_＾]
Decide on when to clean/cut head caches