Question: Order of outbox entry processing #544

rvervaek · 2021-11-05T09:08:06Z

rvervaek
Nov 5, 2021

I can't find any documentation about the 'ordering' of outbox entries that will be processed.

Sometimes it's desirable that the order in which outbox entries are created (for example in case of 'events'), is the same order in which outbox entries are provided for submission (via submitter). This can be hard to achieve when running multiple instances of the same application and would require locking on database level.

Is this library able to achieve same outbox-entry-creation outbox-entry-submission ordering semantics?
If so, does this apply for running multiple instances?

Thanks in advance.

Ruben

badgerwithagun · 2021-11-06T11:06:16Z

badgerwithagun
Nov 6, 2021
Maintainer

At the moment, no. It would be relatively trivial to achieve, but would, by design, force all tasks to be processed in single-threaded fashion. If a task failed, all work would have to queue up behind it.

What you'd end up with is a very slow, poorly designed FIFO queuing system.

That said, neither would be a major issue if all the task did was to push the work into a real queuing system, like Kafka, Kinesis, RabbitMQ in ordered mode or similar, depending on your other requirements. Then the likelihood of failure/blocking would be very low and easily resolved.

So yes, I could see this working as long as performance is not a primary concern and the probability of failure is extremely low and due to some infrastructure issue rather than a code one.

Would happily take suggested PRs and I'll have a ponder about it myself.

0 replies

badgerwithagun · 2021-11-06T11:09:31Z

badgerwithagun
Nov 6, 2021
Maintainer

Also: how concurrent this could be on write to the TXNO_OUTBOX table would depend on how strictly ordered this needs to be. If it can afford to be roughly sequential on the order of writes, you could use a timestamp to order processing, for relatively little cost.

If processing needs to be strictly ordered based on the order in which the original transactions were written, a database sequence number/autonumber/trigger would be needed on TXNO_OUTBOX and that's going to be a bottleneck on write performance.

0 replies

rvervaek · 2021-11-08T14:53:18Z

rvervaek
Nov 8, 2021
Author

Thanks for the explanation. My usecase is in fact a matter of sending the messages to a RabbitMQ exchange. One property of the concept of an 'event' is that it occurred on a specific time, and so my idea was to define the ordering of the messages based on that timestamp. Performance in this case is of less importance, and when processing of a specific task fails, the focus is on fixing the problem and retrying the message processing rather than making sure that all other tasks are not blocked (trying to achieve an 'eventual consistent' process).

I will ponder about is myself as well, but at first glance I would think the introduction of a 'creationTime' property on the outboxmessage could already be enough to support an ordered submission process, based on that creationTime.

If there would be usecases where performance is an issue, I would think implementing a solution like RabbitMQ's consistent hash exchanges or Kafka's topic key partition mechanism could provide more throughput while retaining ordering for specific messages.

0 replies

dustinhiatt-wf · 2021-11-08T15:48:42Z

dustinhiatt-wf
Nov 8, 2021

@badgerwithagun I've got an external logical timestamp and it would be nice to be able to use that for logical ordering in the outbox. Your concern regarding scale is valid but I'm thinking about downstream FIFO queues like SQS FIFO queues. They only need ordering within a single group. So if I could add a group id and ordering signifier it would be nice.

0 replies

badgerwithagun · 2021-11-08T16:16:26Z

badgerwithagun
Nov 8, 2021
Maintainer

I like the idea of user-driven group and ordering signifiers. That plays well with the likes of Kafka as well as supporting multiple queues. Example API:

  outbox
    .groupedOn(kafkaPartitionName)
    .orderedOn(someProperty)
    .schedule(MyClass.class)
    .pushMessageToKafkaTopic(kafkaTopicName, message);

0 replies

badgerwithagun · 2021-11-08T16:20:09Z

badgerwithagun
Nov 8, 2021
Maintainer

Other thought: at the moment, tasks are processed immediately if they can be. This won't work for ordered processing in a distributed application. All work submission will have to me made by the flush(), and flush() will need to be prevented from being run in parallel on multiple application instances.

It will also probably need to be run continuously in a much shorter poll loop then you might use normally (we only have flush() running once every 5 minutes or so as a "mop-up" task at the moment).

All this makes the behaviour very different to "unordered" mode so it'll have to be a nonstandard builder setting and require changes in quite a few places.

0 replies

dustinhiatt-wf · 2021-11-08T16:50:23Z

dustinhiatt-wf
Nov 8, 2021

@badgerwithagun yeah, i've done this previously with a similar methodology. Basically, separate reads from writes and all writes go immediately to the database when the host calls commit. The tricky part is triggering a query for new messages immediately when messages are written. Your framework potentially has a nice mechanism for that already though in the form of your lifecycle callbacks. Basically, whenever a transaction is committed you want to trigger a query against the db immediately to get latest messages available to send.

The read side can be quite difficult though. In a distributed application you'll definitely have to make a tradeoff between maximum latency in the outbox vs being a noisy neighbor to the host application. In my environment, it wouldn't be uncommon to have 20+ containers polling the outbox and when you start grouping by group id it can get a bit expensive. I've optimized this in the past by reifying a view on all containers. So basically, instead of a query finding the next item have it grab a bunch of items and cache those in memory all in a single query. Then implement something like double-checked locking. Go down the list of candidates and attempt to set a run time value if not already set or stale (in a single update statement). This should be logically equivalent to compare and set and avoids transactions (and should be something generic enough you could actually do this in dynamo or other k/v stores too if you want). If num rows modified is 0 on the update, immediately move on to next row else your process essentially has a lock and should perform your action.

You can improve performance here by not selecting the oldest message all the time so not all processes are effectively iterating the same list (or just randomly ordering the items in memory, all that matters is that each group is sequential). If if using a cron, be sure to add some skew on a per instance basis so not all processes are querying the db at the same time.

You might be doing all these things already FWIW, I haven't checked the core logic. I like this library however and if it were extended to support sequential processing within a defined group I'd even go so far as to say I love it :). Thanks for working on it!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Order of outbox entry processing #544

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Question: Order of outbox entry processing #544

rvervaek Nov 5, 2021

Replies: 7 comments

badgerwithagun Nov 6, 2021 Maintainer

badgerwithagun Nov 6, 2021 Maintainer

rvervaek Nov 8, 2021 Author

dustinhiatt-wf Nov 8, 2021

badgerwithagun Nov 8, 2021 Maintainer

badgerwithagun Nov 8, 2021 Maintainer

dustinhiatt-wf Nov 8, 2021

rvervaek
Nov 5, 2021

badgerwithagun
Nov 6, 2021
Maintainer

badgerwithagun
Nov 6, 2021
Maintainer

rvervaek
Nov 8, 2021
Author

dustinhiatt-wf
Nov 8, 2021

badgerwithagun
Nov 8, 2021
Maintainer

badgerwithagun
Nov 8, 2021
Maintainer

dustinhiatt-wf
Nov 8, 2021