-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalability/Performance concerns & questions #3
Comments
hey @nefilim I would definitely suggest performing benchmarking and tuning required throughput for your specific workload. You can get cloudwatch metrics that will tell you about what throttling is happening and what throughput is being consumed. At what rate is each actor receiving events? Why wouldnt you have recent snapshots? Can each actor be recovered lazily as needed (as would happen if you are using cluster sharding)? Where you mention things happening in 1 second, that is per PersistentActor, so theoretically the aggregate consumed throughput could be much higher. DynamoDB will throttle requests over the provisioned limit, and the dynamodb journal will back off and retry operations. So even with thousands of PersistentActors recovering simultaneously, the operations should (eventually) complete. Of course you always have the option of using AWS apis to scale up throughput before a restart and back down afterwards, not the cleanest thing though. A big read spike will of course happen if you are recovering such a volume of data from any datastore, whether based on akka-persistence, eventsourcing or not, if you need to get that data into memory, so it comes down to operational considerations. DynamoDB is operated for you and may (will) cost you (much?) more for a given throughput than the raw infrastructure costs of standing up a cassandra or kafka cluster and using the appropriate journal impl, but then you are responsible for operating that thing too. I'm pretty sure applications of the scope you described were definitely in mind when akka persistence was designed, so I would not want to tell you that your's doesn't fit. Perhaps the cost/throughput/operational characteristics of using dynamodb wont work for you, but I dont think you can say that either till you measure your anticipated workload. +1 to posting your akka-persistence questions to akka-user too. Thanks! |
Thanks @sclasen for your thoughts. I'm using akka-persistence with cluster sharding indeed. I hear you on measuring my workload, I just started to do that very modestly with a fairly low throughput table and immediately ran into throttling to my surprise, hence digging deeper. I'm modeling connected users as persistent actors. If my calculations are correct, they're sounding alarms for me before I even get down to some serious testing. My concern isn't DynamoDB per se really but all the seemingly superfluous reads performed when used with akka-persistence. |
One knob you can turn to speed recovery is the For actors with no events (or no events since the last snapshot), the replay should end there, so I guess Im not sure where all the consumption would be coming from. Are you seeing more batch gets happen after the sequence nr is read? |
Sorry, I'm probably not being very clear. I've added a bit of debug logging to DynamoDBRecovery.{getReplayBatch,asyncReplayMessages} and here's what I see if I send a message to a PersistentActor that hasn't existed before and is not currently running:
only then I see
which bears out given this bit of code in akka.persistence.ProcessorImpl:
this was one of my original concerns in my first post, akka-persistence just indiscriminately loads 0 to Long.MaxValue events to replay and then after that loads the high sequence number, which seems inefficient to me. Why not ReadHighestSequenceNr first and then load 0 to HighestSequenceNr? If I understand akka-persistence-dynamodb code correctly, |
Also, doing a ReadHighestSequenceNr first should negate a new PersistentActor attempting to load persisted events at all since there are none (in the case of akka-persistence-dynamodb saving 1000 read units being consumed for no reason) I believe. |
Aha, got it. yes it does seem to be the naiive approach. Saw your post on akka-user too. the seq is per actor (it was global in eventsourced, but is now per actor), so it does seem like a reasonable upper bound to use on replay. Lets see what goes on in the thread... |
First for Akka Persistence in general (I should probably post this on the Akka Users groups too), if I'm seeing this correctly it looks like a PersistentActor's events are replayed when it's created and/or after the last snapshot was loaded successfully.
Only after that is done, the highest sequence number is read. This seems backwards and wasteful, although I guess if the sequence number is only globally unique it's useless in determining the number of events to load. Looking at ProcessorImpl (which appears to still being used with PersistentActor??) one gets the impression the sequence number is locally scoped:
As ReplayMessages operation is performed on Processor startup, this also goes for a brand new Actor, if the overhead of this operation is high the penalty is paid every time, even for new Processors with no events.
What is the overhead? According to the docs:
"If you perform a read operation on an item that does not exist, DynamoDB will still consume provisioned read throughput: A strongly consistent read request consumes one read capacity unit, while an eventually consistent read request consumes 0.5 of a read capacity unit."
It looks like akka-persistence-dynamodb does 10 requests in parallel of BatchGetItemRequest each for a 100 keys, so 1000 items in total? Can one say this will consume 1000 read units, potentially in 1 second?
What about the overhead for Processors with existing events? Here's where it gets trickier (I'm probably not smart enough to understand the AWS documentation - either way, losing love for DynamoDB in a hurry here with this opaque scheme), from the FAQ http://aws.amazon.com/dynamodb/faqs/
"Q: What is a read/write capacity unit?
…
Units of Capacity required for writes = Number of item writes per second x item size (rounded up to the nearest KB)
If your items are less than 1KB in size, then each unit of Read Capacity will give you 1 read/second of capacity and each unit of Write Capacity will give you 1 write/second of capacity. For example, if your items are 512 bytes and you need to read 100 items per second from your table, then you need to provision 100 units of Read Capacity.
"
From this section http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithTables.html#CapacityUnitCalculations
"For BatchGetItem, each item in the batch is read separately, so DynamoDB first rounds up the size of each item to the next 4 KB and then calculates the total size. The result is not necessarily the same as the total size of all the items. For example, if BatchGetItem reads a 1.5 KB item and a 6.5 KB item, DynamoDB will calculate the size as 12 KB (4 KB + 8 KB), not 8 KB (1.5 KB + 6.5 KB)."
Seems ambiguous to me but in the best case scenario:
In the worst case scenario (1kB reads) this swells to 1000 & 2000 read units I think? What happens if there are more than 1000 events since the last snapshot?
Am I missing something huge here?
My application needs to support 1,000,000s of Persistent Actors, with 10,000s of them running at any time (possibly a few 100,000). A system restart (after which these actors could be (re)created at a rate of 1000/s or more) in the absence of very recent snapshots would require incredibly (impossibly?) high DynamoDB provisioned throughput? I wonder if perhaps akka-persistence-dynamodb or even event sourcing isn't the best fit for my use case?
The text was updated successfully, but these errors were encountered: