-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate Raptor transfer cache in parallel #6326
base: dev-2.x
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev-2.x #6326 +/- ##
==========================================
Coverage 69.85% 69.85%
- Complexity 17921 17931 +10
==========================================
Files 2035 2035
Lines 76495 76520 +25
Branches 7824 7826 +2
==========================================
+ Hits 53434 53454 +20
- Misses 20324 20326 +2
- Partials 2737 2740 +3 ☔ View full report in Codecov by Sentry. |
I think this is going to reduce the throughput slightly. The issue is that one request that require new transfers to be generated will steel processor time from other requests. I am not sure how the affect memory fetches, but It might have a negative effect on running trip searches - at least if a planning request is swapped out in favour of calculating transfers. The threads also loose log-trace-parameters-propagation and graceful timeout handling. The parrallel procecing at least need to be feature enabled using |
There is also a possibility that these are only computed in parallel before start up but not after server is running. I don't know whether this code is used for both cases or not. |
I specifically need it to compute in parallel in order to make our response time down from 4 minute to 1 minute. |
Only check the feature flag during run time, not during start-up. |
I tested this in our dev environment (without parallel routing, so just for start-up). I did not witness the transfer cache processing being faster, it was maybe even slightly slowed on the machines we use. I benchmarked on D4ads v5 machines (4 vcpu) in Azure. |
How many transfers do you have, and what's the load of your machine? We are running on our own hardware (we have just placed a new physical server into the data centre a few weeks ago). |
I tried it with few different instances with slightly different configurations but 3-7 cache requests but I'm not sure about the total number of transfers but I tried with a countrywide deployment, for example. There shouldn't be much other load on the machines but I tested these in kubernetes environments so there are some kube services running on the machines and some of the deployments had some real-time updaters configured also. I wouldn't mind if someone else could also do some benchmarking if these changes have a positive effect or not. |
# Conflicts: # application/src/main/java/org/opentripplanner/routing/algorithm/raptoradapter/transit/RaptorTransferIndex.java
...ntripplanner/routing/algorithm/raptoradapter/transit/request/RaptorRequestTransferCache.java
Outdated
Show resolved
Hide resolved
I deployed this to one of my environments and I did see a slight speed up. Definitely not as much as @miklcct is seeing but still a bit faster. |
OTP is designed to run one request in one thread - this optimize throughput, but have the disadvantage that long lasting request use more time. When testing this it is important to have enough requests to "activate" all threads. This can be done using a test client which run N threads and immediately send a new request when getting a result back. Then run the same test with I am ok with this after the change where we can turn off this feature in REQUEST_SCOPE. |
Ok, I will retest this in our environment at some point to re-validate how it affects the performance in our setup. If we can still observe slightly negative effect, perhaps the start up parallelism should be put behind another feature flag. |
How many transfers do you have? I have now got 5.7M WALK transfers and 16.1M BIKE transfers in my whole graph on the test server configured with 20 minutes transfers for the whole GB, which is 10.3 GB in size. |
I don't know the exact number but it's much much smaller. About 50k stops. |
Related to this, at Entur we will roll-back PR #6238 - it degrade the performance al lot. The real solution to this problem is to optimize the transfer calculations - it should not touch the AStar paths at all. As part of this I would also like to see what would happen if we calculated the transfers in every request, using a lazy approach. This would possibly have a good effect on large graphs. |
Why aren't the performance problems visible in the speed tests? |
I am looking forward to this and this may probably be our long-term solution. If it means a local journey with a new configuration can be done in 15 seconds instead of 2 minutes (even if a long-distance journey still requires 2 minutes) we can more easily explain this to our customers. |
Do the speed test have requests which generate the run time raptor cache on a large transfer graph? It should show immediately. This PR combined with the roll back of #6238 has finally make reasonable bike transfers on the whole GB realistic, even for a long ride between London Terminals. |
Summary
This makes the Raptor cache generating process run in parallel.
Our GB-wide deployment pre-caches 4 configurations on startup. Before applying this fix, it takes 16 minutes to cache 4 configurations for the whole GB on a 16-core machine:
After applying this patch, it only takes 5 minutes:
Also, the journey planning response time for a new configuration has been reduced correspondingly from more than 4 minutes to around 1.5 minutes.
Issue
#6312
Unit tests
None. This is a performance improvement only with no externally visible change.
Documentation
N/A
Changelog
Bumping the serialization version id
Not needed