-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: Resolving the SVC resolution issue #4388
Comments
Thanks for the write-up. I do not quite understand how proposal 2 addresses the titular "Resolving the svc resolution issue". Aren't the two topics distinct? Give-or-take one enabling the other...may be? Regarding your first proposal.
Otherwise, I agree that the number of moving parts involved in resolving exactly one service is ludicrous. So, I'm all for removing some of the redundancy. From what I understand, there are currently two layers of indirection (even 3 if counting the dispatcher):
Now, I think you're suggesting to remove the second resolution mechanism and let the router do the mapping for every packet. Given the simplicity of performing the mapping compared to everything else, that seems reasonable. On the other hand, I don't understand how the mapping is kept stable between packets in the presence of more than one instance of the service. It does not, right? The QUIC connection ID is enough for the server to [de]multiplex its clients, but there still has to be a single server. If we want to solve that later, it might not actually be possible without a visible inter-AS protocol change. So, may be we should consider taking a slightly higher road? How about actually solving the client-side destination-address-update? Is that outright infeasible? Quic-LB looks a bit Rube-Goldbergesque and may be was not designed to solved that problem. It is meant to be handled by load-balancers 2 layers above the router so, not sure we can leverage it. I have more than one reason to consider this... I'd like to see one day when ordinary, unmodified applications can use SCION without relying on a gateway. There are many necessary conditions, but one of them is that TCP connections or at least UDP connections can map to some underlying SCION protocol. Because the control service is an application, if there's ever more than one instance, we'd have to be able to support anycast properly. Therefore QUIC, even with its connection-ID isn't enough unless the client-side updates its destination address. |
Regarding Proposal 1, Supporting Multiple SVC Destinations:
As a simple, layer-violation-free and transport-agnostic alternative, a stateful load balancer can use the tuple (SourceISD, SrcAS, SrcHostAddr, FlowID) as connection identifier to consistently route packets to an anycast destination. (I'm aware that the flow ID is currently not initialized appropriately in the |
Alternative Proposal(copying from slack so that we do not lose it: https://scionproto.slack.com/archives/C8ADA9CEP/p1696322961523659)
(will be further fleshed out) |
Status: Draft
Note that this proposal is not finished yet. There are some TODOs, and we probably want some PoCs to validate the claims.
Resolving the SVC resolution issue
Currently, we have a process that is called SVC (service) resolution in the
SCION control plane exchanges. Its purpose is to resolve the address of a
SCION control plane service. This is done by sending a packet with an SVC
destination address to the target AS. The response contains the address where
the service is reachable (technically a map from protocol to address, but for
now we have ever only supported one protocol)
TODO: Why SVC resolution is an issue
Uses
SVC resolution has currently the following two uses:
Bootstrapping communication to remote AS
Given we are at a very deep layer in the networking stack, we cannot rely on
many other systems during bootstrapping. To establish SCION control plane
connections, we need to talk to services whose addresses we do not know
beforehand. The connection is stream oriented, thus we need packets to
consistently delivered to the same server. SVC resolution is a way to achieve
this. However, currently, I don't see the use case.
Bootstrapping SCION path from One-Hop paths
Currently, during beaconing, One-Hop paths are used to send the SVC resolutions
the responses are sent using a full SCION path, which allows the SVC resolution
client to bootstrap a valid SCION dataplane path.
Both of these two uses can be achieved in a different manner. Use 1 can be
solved in various different ways, we do not necessarily need SVC resolution.
Use 2 is not obviously necessary. If we still require it, we could also decouple
it from SVC resolution.
Background
TODO: Add more background
Proposal 1 - SVC resolution free communication
One key observation is that we are using QUIC in our SCION control plane
protocol. During the design of QUIC, a lot of thought went into connection
migration and resumption. Every QUIC packet carries a connection
ID. Different
connections are de-multiplexed based on their connection ID, and not based on
the addresses of the packets. The connection ID can also be used by load
balancers to infer where the packet should be sent to. E.g.,
quic-lb
attempts to standardize a scheme to encode routing data in the connection ID.
We can leverage this fact in our SCION control plane too. In the world of QUIC
connections, resolving an address first to establish the connection is
unnecessary. As long as the packets get routed to the same server, it will
manage to identify the connection by the connection ID. The destination address
is irrelevant. Thus, the client can simply establish a QUIC connection with an
SVC destination address. This has been proven to work with
#4387 that simply drops SVC
resolution when dialing gRPC connections for the SCION control plane. All
packets from the client contain an SVC address as the destination. The reply
packets contains the real server address.
:::{note}
In theory, the client could change the destination address after it has received
the first response. However, in the past this has been proven to be hard in
practice because various parts of the quic-go implementation have assumptions
that the destination address does not change. It is also not required, thus,
I would advocate for not doing it.
:::
Keeping Backwards Compatibility - De-multiplexing
Simply dropping SVC resolution is not an AS internal change, the whole network
needs to adapt. Naively switching on this new behavior would lead to
interruptions, which is not feasible given our productive deployments.
However, there are two key observations here:
change at all.
payload of a SVC resolution request is empty (0 bytes), every QUIC packet
carries some information and the UDP payload is never 0 bytes.
To keep backwards compatibility, we can use these observations. We initialize a
packet connection that passes all non-empty payload packets further up the stack
to the QUIC server, and all empty payload packets are treated as SVC resolution
requsts. Luckily, we have implement something like this in the past which can be
used as inspiration:
svc.ResolverPacketDispatcher.
With this change, the control server will open a single UDP/SCION socket and
handle SVC resolution and regular QUIC connections on the same socket. This allows
a two phase rollout plan: First upgrade the whole network with the server side
changes, then enable the client side. We could even do a "happy eyeballs" approach
and try both at the same time.
Supporting Multiple SVC Destinations
In current deployments, there is usually only one control/discovery service
reachable from any given border router. However, in the future, we might want to
support setups where multiple instances can be reached via a border router. This
is still possible. In such a setup, the connection ID will encode the target
instance. (e.g., with the
quic-lb
scheme). This will allow for consistent routing across the different QUIC
packets of the same connection. Given we do not have such a use case yet, we do
not need to handle it right now. Implementation may vary, but they are AS
internal implementation details. Every as can decide how to do this without
affecting any other AS in the network.
Drawbacks
This proposal relies on the fact that we are using QUIC. It is crucial that we
have connection IDs, such that connections can be identified and routed
consistently. This is a slight layer violation.
Alternatives Considered
TODO
The text was updated successfully, but these errors were encountered: