Documents in journal and realtime collection have different _id #219

yahor-filipchyk · 2019-02-13T21:45:14Z

My understanding is that _id gets assigned to the document when an insert gets executed by the mongodb client. The fact that the write to realtime collection is happening asynchronously makes me think that it is possible that some ids may end up being out of order between the batches. Besides, _id is used as an offset on the query side and when eventsByTag is used the saved offset (retrieved from the realtime collection) won't in fact exist in the journal which at least makes it weird as the ordering of ids may still make sense. I think setting _id field explicitly when serializing events makes sense as id generation is happening on the client side anyway so we get the same uniqueness and sequencing guarantees but it ensures the same documents in the two collections have the same ids.

Also wanted to point out how the write to the realtime collection is happening asynchronously with no error handling:

batchFuture.andThen { case _ => doBatchAppend(writes, realtime) }

I think at least some error logging would be useful, but it could also be a good idea if the user of the library could choose if an error to write to realtime collection should be propagated. I think failing all realtime subscriptions can be useful. Then the stream can restart and pick up missed events from the journal and continue listening to realtime events

The text was updated successfully, but these errors were encountered:

scullxbones · 2019-02-16T02:32:29Z

Hi @yahor-filipchyk -

I've read and re-read this a few times. It seems like there's many different things going on - can you resummarize? Just trying to narrow this down to something actionable.

I think we can start with:

log errors on failure to batchAppend to realtime

yahor-filipchyk · 2019-02-17T18:36:28Z

Hi @scullxbones,

Sorry for shoveling all this into one issue. I think I can identify 3 distinct issues here:

log errors on failure to batchAppend to realtime (just as you said)
propagate errors from writes to realtime collection to all active realtime subscriptions if possible (an enhancement to 1)
make an atom written to both journal and realtime collection have same _id

I've been writing this issue up with 3) in mind primarily because I think it's causing some problems (or can potentially).

To elaborate on 3) a little bit, when an atom gets serialized to BSON, _id is not assigned by the serializer which makes the mongo client create the id. This is happening asynchronously as writes get submitted to the mongo client. The result is the same atom when written to both journal and realtime collections will have different _id value. If you have a realtime listener listening to events by tag it will get the _id from the realtime collection as an offset. When the listener is restarted it is going to use the offset from the realtime collection to read events from the journal. Isn't that weird?

scullxbones · 2019-02-20T02:57:55Z

Ah ok yep I get it now, thanks for summarizing. Now I totally understand why the _id is a problem, I was struggling a bit with that. This just further convinces me that #214 / #95 are needed to fix in the ideal way. The _id fix would be good for consistency, and would help in the meantime for sure.

Bullet 2 will probably be a stretch, so i'd think 1 & 3 should take priority. I'm good with this ticket covering both. I'll pull bullet 2 into a separate ticket.

* Generate IDs before sending to mongo, reuse serialized documents * Log error if realtime write fails * Clean up some deprecation warnings and other code warnings

Issue #219 - IDs should match between journal & realtime

scullxbones · 2019-03-04T01:42:23Z

Fixed by #222 ... will update with a release version

scullxbones · 2019-03-05T04:29:11Z

Released with 2.2.3

* Generate IDs before sending to mongo, reuse serialized documents * Log error if realtime write fails * Clean up some deprecation warnings and other code warnings

scullxbones mentioned this issue Feb 20, 2019

Propagate errors from writes to realtime collection to all active realtime subscriptions #221

Open

scullxbones self-assigned this Mar 2, 2019

scullxbones added a commit that referenced this issue Mar 4, 2019

Merge pull request #222 from scullxbones/wip-219

7eeea28

Issue #219 - IDs should match between journal & realtime

scullxbones closed this as completed Mar 5, 2019

scullxbones mentioned this issue Jul 21, 2019

Do not manually generate Mongo ObjectIds on the client. Let MongoDB to create them. #238

Closed

scullxbones mentioned this issue Aug 18, 2020

Journal Issues on EventsByTag #370

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documents in journal and realtime collection have different _id #219

Documents in journal and realtime collection have different _id #219

yahor-filipchyk commented Feb 13, 2019 •

edited

Loading

scullxbones commented Feb 16, 2019

yahor-filipchyk commented Feb 17, 2019 •

edited

Loading

scullxbones commented Feb 20, 2019

scullxbones commented Mar 4, 2019

scullxbones commented Mar 5, 2019

Documents in journal and realtime collection have different _id #219

Documents in journal and realtime collection have different _id #219

Comments

yahor-filipchyk commented Feb 13, 2019 • edited Loading

scullxbones commented Feb 16, 2019

yahor-filipchyk commented Feb 17, 2019 • edited Loading

scullxbones commented Feb 20, 2019

scullxbones commented Mar 4, 2019

scullxbones commented Mar 5, 2019

yahor-filipchyk commented Feb 13, 2019 •

edited

Loading

yahor-filipchyk commented Feb 17, 2019 •

edited

Loading