Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea for new query: PersistenceQuery "EventsByDate" #66

Open
thjaeckle opened this issue Nov 20, 2015 · 9 comments
Open

Idea for new query: PersistenceQuery "EventsByDate" #66

thjaeckle opened this issue Nov 20, 2015 · 9 comments

Comments

@thjaeckle
Copy link
Contributor

Hi @scullxbones

I cannot thank you enough for this great library which even gives some joy working with MongoDB :)
We currently face the challenge of retrieving events from the events-store which were created in the last "X TimeUnit" (e.g. in the last 5 minutes) in order to implement a sophisticated sync mechanism for a search service (another microservice) which sometimes does not get all our emitted events.

What do you think about a PersistenceQuery which retrieves all events before or after a given Date/Timestamp.

I thought about using the _id field for that (which includes a timestamp), but that would have at least 2 drawbacks:

  • MongoDB cannot add an index in order to make the query perform
  • One would need to add a JavaScript function executed in Mongo to extract the Date (see: http://stackoverflow.com/a/8753670/5058051 ) - not nice .. :)

So I guess each journal entry would have to get a "ts" then in order to be able to query (e.g. also with index on that field).
Is that something you would want to have in your library or do you think that does not fit?

I could see what I could contribute as a pull request then ..

Regards Thomas

@thjaeckle
Copy link
Contributor Author

Ah, colleagues gave me a hint that the _ts approach should perform quite nicely.

Should even work with the ObjectId type: http://api.mongodb.org/java/current/org/bson/types/ObjectId.html

I will have a deeper look into this :)

@matheuslima
Copy link
Contributor

The problem I see is this timestamp needs be generated using some reference, In a distributed system, If we have multiple writer nodes we can get a case where nodes have time desynchronized and it can lead to inconsistency issues. Can you detail better your business requirements?

@scullxbones
Copy link
Owner

Yep, the reference clock issue was the immediate question that jumped to my mind as well seeing this ticket. You could maybe use mongo as a reference clock, but I don't know what (if any) assurances are given that a mongo cluster's (e.g. a replica set) clock is synchronized.

It's an interesting concept, and I have thought about this before in the context of whether InfluxDB was viable as a persistence journal. The clock issue was what stopped me from getting too serious about it.

@matheuslima
Copy link
Contributor

Anyway this query not seems generic enough to me to deserve an API level implementation. This seems a business level feature.

@thjaeckle
Copy link
Contributor Author

Do the replica set's clocks even need to be in sync? Inserts are always done on the primary, aren't they? I must admit that I'm not that firm with mongodb.
Could one get the reference clock by doing an insert on the primary first and retrieving the date from that inserted document as reference?
Just thinking aloud here, sorry if it's nonsense ;-)

The business requirement is that a search service wants to get the last modified persistence ids plus their sequence numbers in order to check if it missed some events and manually do a resync + search index update for those.
It would eg poll every five minutes and ask for modifications of the last 6 minutes (some overlap seems reasonable here).

@scullxbones
Copy link
Owner

Do the replica set's clocks even need to be in sync? Inserts are always done on the primary, aren't they? I must admit that I'm not that firm with mongodb.

Yes for replica sets inserts are done on the primary - it's a single-master system. So everything is fine and good until there is a primary re-election, let's say the primary crashes. Now inserts proceed on the new primary - which could introduce a consistency issue should the clocks be misaligned.

That's the simple cluster case, if mongo sharding is used, then different mongod's are used for inserting into each shard. At that point, you're probably guaranteed consistency issues, at least across persistenceIds. If the clocks only need is to be consistent within a persistenceId you're probably OK.

@thjaeckle
Copy link
Contributor Author

I understand that under this conditions this won't make it in the plugin.
For me, such strong consistency requirements are not needed, so I'll try writing my own PersistenceQuery then and hope that I can access all needed Traits, Classes, etc.

Thanks for the discussion. :)

@scullxbones
Copy link
Owner

I think this should be doable with caveats as of merge of #150 - adding timestamps to the events will make this kind of query straightforward. The resolution of the query would be limited by the differences in internal clock time of the nodes inserting records. For slower-moving persistent actors (say < 1msg/50ms), I'd guess it would be quite accurate.

@scullxbones scullxbones reopened this May 13, 2017
@fabiangebert
Copy link

fabiangebert commented Nov 12, 2017

I find this also quite useful to use the auto expire feature of MongoDB to get rid of old persistence journals of crashed actors. See https://www.ekito.fr/people/auto-expire-documents-mongodb-collections/
The _id implicitly contains a timestamp anyway so why not use the insertion time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants