Custom starting offset #54

sajal · 2015-02-16T14:18:05Z

Hi all,

Currently i do some event stream style processing using homebrew Go code passing messages using zmq. There is no resiliency/failover, and there is no partitioning of messages as provided by kafka.

I am looking into using Kafka + go_kafka_client and have some specific questions.

On startup (and when re-partition happens) I want to start at an offset that is x mins ago (approx - does not heve to be exact, I can deal with few extra messages) . It appears sarama can do this. How would I go about using the work distribution goodness of go_kafka_client and put in some custom starting offset logic? Ignoring the fact that ive already consumed a specific message, or that there are unconsumed messages from > x minutes ago.

-Sajal

joestein · 2015-03-02T02:01:55Z

We could probably start storing an offset for every minute or something so you can re-wind at any minute in the stream. We are working on some refactoring in that part of the code over the coming weeks. I think so maybe we can try to hook something up that would work best.

sajal · 2015-03-02T20:40:26Z

For my use case, I could manage the offset myself if there were a way to specify which starting offset to use when starting up (or re-balancing).

A logic like x offsets ago is also fine. Say when starting up or (rebalancing) the newest offset is x... it could process from offset x - n where n is something configured.

baconalot · 2015-04-14T12:56:36Z

+1 a common usecase I have is priming a job with some historic data, but not the complete available dataset.

baconalot · 2015-04-15T09:23:53Z

Only now do I see the problems here. With a single consumer pid it would work fine if we had something in config like ForceSetParitionStart = 1234.
Flow would then be:
-pid 333 (consumer.go) start
-set offsets to 1234
-start consuming

But... with multiple pids (chronos anyone):
-pid 333 (consumer.go) start
-(333)set offsets to 1234
-(333)start consuming
-pid 334 (consumer.go) start
-(334)set offsets to 1234
-(334)start consuming
-(333) -> reprocess messages

Also how would this be configured. Cant be single int, since there can be any topic/partitions in the conf's group.

For now I am just going to use a helper that can set a commit manually for me:

    zkconfig := go_kafka_client.NewZookeeperConfig()
    zkcoord := go_kafka_client.NewZookeeperCoordinator(zkconfig)

    err := zkcoord.Connect()
    if err != nil {
        fmt.Errorf(err.Error())
    }

    tp := go_kafka_client.TopicAndPartition{}
    tp.Topic = "some_topic_name"
    tp.Partition = 213
    err = zkcoord.CommitOffset("some_group_id", &tp, 123)
    if err != nil {
        fmt.Errorf(err.Error())
    }

sajal · 2015-04-15T09:35:20Z

Isint Offset tied to a partition and not at topic level? or do you mean to process single partition in multiple pids?

As long as each partition gets processed by single pid(my usecase), is there any issues in using x - n as starting offset?

joestein assigned olebedyn Mar 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom starting offset #54

Custom starting offset #54

sajal commented Feb 16, 2015

joestein commented Mar 2, 2015

sajal commented Mar 2, 2015

baconalot commented Apr 14, 2015

baconalot commented Apr 15, 2015

sajal commented Apr 15, 2015

Custom starting offset #54

Custom starting offset #54

Comments

sajal commented Feb 16, 2015

joestein commented Mar 2, 2015

sajal commented Mar 2, 2015

baconalot commented Apr 14, 2015

baconalot commented Apr 15, 2015

sajal commented Apr 15, 2015