-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SCD] Enable USS to propose OVN to increase parallelization #1078
Comments
It seems like there are two features mentioned here:
Number 2 is outside the scope of the DSS (the DSS doesn't do anything differently; only the USSs), so I'll assume it's outside the scope of this request. Number 1 seems like something we can do. I'd expect we would go about this by doing something like:
After these changes, a USS could include a After these changes, a USS could also include @LeslieW do you think those changes would satisfy this request? The reason to specify a suffix rather than the full OVN is that this allows the DSS to ensure global uniqueness without maintaining a global OVN database, while still allowing the USS to know what the OVN will be. If the USS specified the entire OVN, the DSS would have to verify the OVN didn't collide with any other OVNs for any operational intent, past or present, and that would be a pretty major change. By prefixing the OVN with the operational intent ID, we can scope the collision check to just that operational intent ID and therefore use the But, there is no guarantee that an operational intent ID will not be reused in the future (after the first operational intent is out of the system), so including the timestamp of the request as part of the suffix ensures global uniqueness of the OVN even if the operational intent ID is reused in the future. Requiring the USS to specify the timestamp as part of the suffix ensures that clock skew between USS and DSS and/or request latency will not cause problems -- the DSS can simply verify that the requested timestamp is within the accepted clock skew, which could perhaps be tens of seconds. @barroco and @mickmis , do you see any issues with this approach, see any better ways to accomplish the goal, or have any other thoughts regarding these changes? It seems to me like this would make the system more resilient to DSS latency, and given customer observations of the impact of that, it seems like that additional resiliency would be worth having in the near term. |
About addressing the planning of many flights in succession, maybe batching of operational intent upsert could help? The implementation of a new endpoint doing so could relax the key checks to ignore the OVNs of the operational intents that are part of the batch. About an USS requesting op intent details before the DSS response has been received/processed by the owning USS, how likely is that in practice? I see that happening only within this timeframe which should just be dependent on network latency:
Have you seen that happen a significant amount of time? About your proposal @BenjaminPelletier I don't spot any obvious issue. It does seem to be retrocompatible with the standard and not have any impact for USSs that do not use this feature. Just some minor feedback:
|
Our traffic shape does not allow for batching unfortunately, so a batch endpoint would not be particularly useful to us here, however it's worth noting that our intents are usually not geospatially proximate, so the s2 cell sharding used by the DSS should be sufficient for our throughput needs. About an USS requesting op intent details before the DSS response has been received/processed by the owning USS, how likely is that in practice? I see that happening only within this timeframe which should just be dependent on network latency: This was revealed to us by the latency issue we saw in the DSS with the CRDB indexes not functioning correctly, however nevertheless it is a correctness issue only revealed by a latency issue. It could also occur if there was a network partition, server crash etc, where the response is never received by the USS after is has been accepted by the DSS. Our hope is to make the standard more resilient to the wide variety of failure conditions that any distributed systems encounters. |
Ah, that's probably fair -- while there is no functional change to the DSS to support the second feature, defining how it should work for USSs would be valuable. So yes, in step 5, I think we'd make that API addition to the USS-USS endpoints as well.
This sounds like a good idea to me. @callumdmay or @LeslieW any concerns about having the suffix be a UUIDv7 specifically? The exchange would then look like this:
Another time this would happen is if a USS client gives up due to response time when attempting to perform a DSS operation. For instance, if the USS has a 5-second operation deadline and the DSS takes 6 seconds to respond, the USS (in the 5-second timeframe) cannot distinguish the behavior of the DSS from an unsuccessful request, so it just assumes the operational intent reference doesn't exist (which leaves the operational intent reference in the DSS without the USS acknowledging its existence). With this proposed-OVN change, the USS could instead assume the operational intent reference does exist and put it in the queue for later positive cleanup, in the meantime serving the valid details to anyone who might happen to ask. I think the big value here is correctness and robustness as @callumdmay mentions. Proposed OVNs allows a USS to begin serving usually-correct operational intent details prior to making the DSS call, eliminating any problems in the operational intent reference creation process as reasons that might block another USS from planning in the airspace. @mickmis assuming no issues with UUIDv7 from @callumdmay or @LeslieW, do you think you might have time to work on this in the near term? |
I think UUIDv7 is a great choice, +1 to that |
Indeed, that did not come to my mind!
Yes I will start on it after I'm done with a first draft for #1074 which shouldn't take too long, unless you'd rather prioritize differently. |
#3) This is part of the effort for interuss/dss#1078 aiming at enabling USSs to request a specific OVN for operational intent. This PR is stacked on top of #2, please review only 297e44a. This adds: - to the DSS `(create|update)OperationalIntentReference` endpoints the `requested_ovn_suffix` field optionally enabling managing USSs to request a specific OVN. - to the USS `getOperationalIntentDetails` endpoint the `version` query parameter optionally enabling USSs to request a specific version of the operational intent.
Implementation tasks
Original issue
Is your feature request related to a problem? Please describe.
Recent load testing for SCD revealed a limitation: a USS may need their OVN in internal storage before it is received and processed from the DSS. The following scenarios come to mind:
Describe the solution you'd like
It would be helpful if each USS could optionally provide an OVN to the DSS. Perhaps something similar to below:
Each USS would have a current and proposed version of an operational intent. By default the
{operational-intent-id}/ endpoint would return the current version. A proposed version becomes the current version when the response is received from the DSS, or if another USS specifically requests that version (meaning the DSS response is still pending or was lost, but the DSS accepted the request). Each USS would additionally have a {operational-intent-id}/{version} endpoint that would return either the current or proposed version. A USS is not required to store previous versions beyond current and proposed.
Visual resources
Describe alternatives you've considered
Alternatives seem limited as this is a distributed system -- without this distributed transaction a USS can only do a best effort reconciliation which increases load on the DSS or latency when responding to other USSs.
Additional context
@BenjaminPelletier and @callumdmay may have additional context (much of the above was pulled from a conversation we had).
The text was updated successfully, but these errors were encountered: