Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRKey #4039

Closed
JordiSubira opened this issue May 10, 2021 · 37 comments
Closed

DRKey #4039

JordiSubira opened this issue May 10, 2021 · 37 comments
Assignees

Comments

@JordiSubira
Copy link
Contributor

Extending DRKey derivation scheme

The main goal of this issue is to enable the discussion about the DRKey derivation scheme, particularly what is desirable in order to merge it upstream.

The current version (in scionlab) implements the derivation scheme described in the documentation. This proposal includes two extensions to this derivation scheme: Protocol-specific derivation and Host-specific Delegation Secret.

Protocol-specific derivation (SV_A^{proto})

The current derivation scheme does not support protocol-specific secret values (SV_A^{p}). This extended key hierarchy was discussed in the PISKES paper and it might turn out to be useful for use cases which require sharing a protocol-specific SV to enable fast derivation (e.g. Lightning Filter).

Nonetheless, the derivation scheme (in the PISKES paper) does not consider specifying different epoch durations for each SV_A^{p}, i.e. SV_A^{p} inherit the duration from SV_A. However, we might want to define the epoch duration with finer granularity depending on the protocol. For instance, AS A decides to share SV_A^{'ntp'} with a handful of servers (which increases the risk of this SV being leaked) and, thus, might want to define a short epoch duration, whereas SV_A^{dns} will be shared with only one trustworthy server allowing a longer duration. Another aspect to consider might be the trade-off between epoch duration and communication overhead.

In addition, after some discussions, there does not seem to be a reasonable use case in which SV_A should be shared outside the Key Server. In that case, we could consider directly deriving SV_A^{p} from the master secret (in the PISKES paper it is derived using SV_A), which would untie completely epoch durations.

                            SV_A                                    SV_A{p}          (0th level)
                            |                                       |
                        +------+-----...                            +------+-----... 
                        |                                           |
                    K_{AB}                                     K_{AB}^{p}           (1st level)
                        |                                           |
    +----------------+-------------------+---...           +----------------+--...       
    |                |                   |                 |
K_{AB}^{p}    K_{AB:H_B}^{p}    K_{A:H_AB:H_B}^{p}     K_{A:H_AB:H_B}^{p} ...      (2nd level)

Host-specific Delegation Secret (HK_{A:H_A→B}^{prot})

This key would be similar to the Delegation Secret, although it is bound to some specific host in the issuer-AS (the AS on the fast side) rather than allowing to derive for any host, thus, constraining further the derivation scope.

SV_A
    |
K_{AB}
    |
HK_{A:H_AB}^{prot}
    |
K_{A:H_AB:H_B}^{prot}

As with the Delegation Secret, the CS on the slow side is required to carry out an extra derivation step.

@matzf
Copy link
Contributor

matzf commented May 14, 2021

Thanks for writing this up.

+1 for the "protocol-specific derivation" scheme, I think this is a good idea. It seems very nice how this would make the delegation secret a completely normal level-1 key, where it was previously a fairly confusing special case.
This also addresses a concern we had for authenticated SCMPs. The secret value needs to be shared with all the routers of the AS, so that they can derive keys for any destination host, but we only need the scmp protocol. If we use the same value also for other protocols, handling the keys in the routers would have comply with the security requirements for the most sensitive of these protocols. It could be problematic if the routers would be required to have and use a security enclave to derive the keys. If we can limit the scope of the secret value to the scmp protocol, this seems less likely to become a concern.

I think we could go a bit further than what you and the PISKES paper suggest and only support the protocol-specific key hierarchies, that is, drop support for the current the hierarchy (i.e. the left hiearchy in your illustration). It depends on how many protocols we expect to be used, obviously -- my gut feeling is that as long that there are is less than say ~a dozen protocols, the overhead would be manageable.

+1 from me also for the "host-specific delegation secret". In the context of authenticated SCMPs, this would allow end hosts to request and cache keys per remote AS instead of per remote host. This only applies to the SCMP messages using host-to-host keys, i.e. Echo Request/Reply and error messages like Port Closed. Although this does not allow the end hosts to be less cautious about the vulnerability to "MAC flooding" attacks in general (an attacker could simply send traffic from many different ASes, real or fraud), this could be a practical performance benefit.
For the routers, this would not change anything, as they use the unaffected AS-to-host keys.

In order to keep the derivation schemes as simple as possible, I would suggest to always enable this four-tiered derivation scheme with the host-specific delegation key -- not entirely sure whether you meant this, but your reference to the Delegation Secret made me think that you would perhaps make this optional. Consequently, I wouldn't use a special terminology and notation for this key either, just "level 2 key" K_{A:H_A->B}, and the host-to-host key will then be called level-3 key.

One small point that we need to be careful about is: we have to ensure that this new, intermediate host-specific delegation key is not accidentally identical an AS-to-host key, i.e. that K_{A:H_A->B} != K_{A->B:H_B} even if H_A and H_B are identical addresses. This can be achieved e.g. by adding a key direction identifier to the PRF inputs. In the table below I described this by prepending a 0 or a 1 byte.


Putting all this together, i.e. only keeping the protocol specific hierarchy and adding the new host-specific intermediate key, the key hierarchy would become:

       SV_A^{proto}
           |
        K_{A->B}
        /       \
K_{A:H_A->B}      K_{A->B:H_B}
     |
K_{A:H_A->B:H_B}

All keys here are to be understood specific for each protocol proto, the corresponding superscript is just omitted for brevity.

Identifier Derivation Name Description
SV_A^{proto} Level 0 key, secret value
K_{A->B} PRF_{SV_A^{proto}}(B) Level 1 key, AS-to-AS Derived in A, shared between key servers.
K_{A->B:H_B} PRF_{K_{A->B}}(0|H_B) Level 2 "to" key, AS-to-host Derived in each key server. Only available to B:H_B and trusted infrastructure.
K_{A:H_A->B} PRF_{K_{A->B}}(1|H_A) Level 2 "from" key, host-to-AS Derived in each key server. Only available to A:H_A and trusted infrastructure. Typically only used as intermediate step in derivation, not for communication.
K_{A:H_A->B:H_B} PRF_{K_{A:H_A->B}}(H_B) Level 3 key, host-to-host Derived in H_A, derived in key server B for H_B. Available to A:H_A and B:H_B.

@oncilla oncilla self-assigned this May 14, 2021
@JordiSubira
Copy link
Contributor Author

JordiSubira commented May 14, 2021

Thank you for the great contribution!

Indeed, I agree that we could go further and only support the protocol-specific key hierarchy provided that there are a handful of protocols.

I also agree with keeping it as simple as possible, i.e. only supporting this "four-tier" derivation scheme. I used the DS to draw a comparison with the current scheme (as both act as an intermediate step, as you mentioned).

Good point, encoding the derivation direction is necessary to avoid the situation you described. Intuitively, prepending this byte seems a valid approach to me.

@mlegner
Copy link
Contributor

mlegner commented May 17, 2021

Thanks for this writeup, @JordiSubira. In general, I would support both protocol-specific SVs and another level for host-to-host keys. I'm not so sure about the independent validity periods though.

However, we might want to define the epoch duration with finer granularity depending on the protocol.

Another aspect to consider might be the trade-off between epoch duration and communication overhead.

If we have synchronized validity periods of different SVs, then a single DRKey exchange request would be sufficient to fetch all of them. If we completely decouple them, we may need a much larger number of requests.

In that case, we could consider directly deriving SV_A^{p} from the master secret (in the PISKES paper it is derived using SV_A), which would untie completely epoch durations.

Makes sense.

I think we could go a bit further than what you and the PISKES paper suggest and only support the protocol-specific key hierarchies, that is, drop support for the current the hierarchy (i.e. the left hiearchy in your illustration).

I'm not really convinced by this: There may be relatively obscure protocols that are used by a few hosts but don't warrant their own SVs. These could be derived from the general-purpose DRKeys on demand without the ASes having to prepare for them.

This has the downside that we would have two different types of protocol-specific keys: (i) those derived from their own SVs and (ii) those derived from the general-purpose 1st-level keys.

I would suggest to always enable this four-tiered derivation scheme with the host-specific delegation key

I agree, that makes sense. The single additional derivation step at key servers is probably negligible.

One small point that we need to be careful about is: we have to ensure that this new, intermediate host-specific delegation key is not accidentally identical an AS-to-host key, i.e. that K_{A:H_A->B} != K_{A->B:H_B} even if H_A and H_B are identical addresses. This can be achieved e.g. by adding a key direction identifier to the PRF inputs. In the table below I described this by prepending a 0 or a 1 byte.

I would suggest to add 1-byte prefixes to all derivations to achieve domain separation and prevent any accidental clashes.

@oncilla
Copy link
Contributor

oncilla commented May 17, 2021

I would generally advise for keeping the system as simple as possible. The more configurability there is in the system, there more branches there are. And as a consequence more room for errors during implementation. But what we should also think about is, what types of protocols do we want to support?

Do we want to only support a curated list of protocols, or do we want to allow ASes to negotiate protocols that are not necessarily known to the whole SCION world?

To give some historical context:
In the beginning we envisioned the protocol identifier to be a simple string, such that there is no hard limit on the protocols that are support. This would even allow two ASes to decide to use a custom protocol that is not known to the outside SCION world. The protocol string was simply an input to the first order key derivation. We also envisioned that protocols can have protocol specific derivations, at least at the second level of DRKey, where other inputs could be taken into account.

Regarding the derivation direction. This is a very important point that was covered in the original python implementation by including a byte that defines the "type" of derivation as the input to the PRF.

Regarding the 4 level approach: I like it very much. We tried to keep the derivation count as small as possible for the sake of the router. But since that is not affected, and this allows the intermediate key to be cached at the hosts, this is a neat change.

@JordiSubira
Copy link
Contributor Author

Thank you for your contribution to the discussion @mlegner and @oncilla.

I would support both protocol-specific SVs and another level for host-to-host keys. I'm not so sure about the independent validity periods though.

The problem with the current design is that you cannot define different validity periods (depending on the protocol) within one AS. Conceptually, I do not see why they must be dependent (provided that the SV_A will not leave the Key Server), but I might be missing something.

If we have synchronized validity periods of different SVs, then a single DRKey exchange request would be sufficient to fetch all of them. If we completely decouple them, we may need a much larger number of requests.

Could you @mlegner elaborate this a little bit further? How would you fetch all of them and how does the fact of having synchronized validity periods help?
I was also assuming a reasonable amount of protocols, which would imply not that many additional requests.

Do we want to only support a curated list of protocols, or do we want to allow ASes to negotiate protocols that are not necessarily known to the whole SCION world?

Indeed that's the main point. From the PISKES paper (and also some offline discussions), there seems to be some important use cases, for which having the protocol-specific SV^{p} is useful (e.g. Lightning filter or SCMP). I am not completely ruling out the general-derivation scheme (which would allow to negotiate for any protocol), however, as aforementioned, it is always preferable to keep things as simple as possible.

Regarding the derivation direction. This is a very important point that was covered in the original python implementation by including a byte that defines the "type" of derivation as the input to the PRF.

Yes, this "type" is also in the current Lvl2Keys Go implementation (being a 1-byte value) and it is used as input in the derivation. Considering the 4-levels, it should be adapted so that "types" are consistent, but I guess the idea would be the same.

@matzf
Copy link
Contributor

matzf commented May 18, 2021

@mlegner

I'm not really convinced by this: There may be relatively obscure protocols that are used by a few hosts but don't warrant their own SVs. These could be derived from the general-purpose DRKeys on demand without the ASes having to prepare for them.

This has the downside that we would have two different types of protocol-specific keys: (i) those derived from their own SVs and (ii) those derived from the general-purpose 1st-level keys.

I understand your point. My doubt is that I cannot really think of a way to reasonably pick one the two options when designing a protocol based on DRKey. Somebody would have to say "this is going to be niche, not worth a separate SV" and, as the keys between the different hierarchies are incompatible, accept to be locked into this choice (unless the protocol in question has some form of negotiation or compatibility built in).

Perhaps a compromise could be to only implement the per-protocol SVs for now, but designate a suitable protocol identifier (e.g. empty string, or 0) for use as the general-purpose protocol with the "classic hierarchy" derivation.

@oncilla

But what we should also think about is, what types of protocols do we want to support?

Do we want to only support a curated list of protocols, or do we want to allow ASes to negotiate protocols that are not necessarily known to the whole SCION world?

To give some historical context:
In the beginning we envisioned the protocol identifier to be a simple string, such that there is no hard limit on the protocols that are support. This would even allow two ASes to decide to use a custom protocol that is not known to the outside SCION world. The protocol string was simply an input to the first order key derivation. We also envisioned that protocols can have protocol specific derivations, at least at the second level of DRKey, where other inputs could be taken into account.

Maybe port numbers could be a reasonable analogy; there are some well known, standardized port numbers, but a lot of unassigned space too, and in the end you can pretty much make up whatever you want. For some use-cases for DRKey only a well-known identifier makes sense, but other usages might be able to make use of ad hoc protocol identifiers.
This analogy also suggests that a two byte identifier might be enough to cover all the protocols we will ever care about. 😉

One related point on string vs. fixed length identifier for protocols: it may be advantageous to have fixed length inputs to the PRF to allow using a simple CBC-MAC, or even just a single block cipher invocation, for the PRF. This is less relevant, though, when only the per-protocol SVs are used, as this is a local choice of each keyserver in this case.

@mlegner
Copy link
Contributor

mlegner commented Jun 1, 2021

Perhaps a compromise could be to only implement the per-protocol SVs for now, but designate a suitable protocol identifier (e.g. empty string, or 0) for use as the general-purpose protocol with the "classic hierarchy" derivation.

I think that might be a good solution.

@mlegner
Copy link
Contributor

mlegner commented Jun 1, 2021

If we have synchronized validity periods of different SVs, then a single DRKey exchange request would be sufficient to fetch all of them. If we completely decouple them, we may need a much larger number of requests.

Could you @mlegner elaborate this a little bit further? How would you fetch all of them and how does the fact of having synchronized validity periods help?
I was also assuming a reasonable amount of protocols, which would imply not that many additional requests.

What I was thinking about is to have an API call requesting all (standardized) 1st-level keys from a remote CS. This works well, if the validity periods of them are equal (or at least coordinated) as it would allow a CS to make a single key request per day to proactively fetch all 1st-level keys before the old ones expire.

If all protocol-specific secret values and 1st-level keys have uncoordinated validity periods, each of them needs to be fetched individually. I know that this is simply a constant multiplicative factor for the number of requests, but it still makes a difference as each of them involves a TLS handshake / signature check.

@JordiSubira
Copy link
Contributor Author

@matzf

Perhaps a compromise could be to only implement the per-protocol SVs for now, but designate a suitable protocol identifier (e.g. empty string, or 0) for use as the general-purpose protocol with the "classic hierarchy" derivation.

Do you mean that for this empty string or 0 identifier protocol, Lvl2Keys are derived by using another protocol string as input? What is the benefit with respect to keeping the two hierarchies separated?

@mlegner

What I was thinking about is to have an API call requesting all (standardized) 1st-level keys from a remote CS. This works well, if the validity periods of them are equal (or at least coordinated) as it would allow a CS to make a single key request per day to proactively fetch all 1st-level keys before the old ones expire.

IMO, it seems more natural that the AS can define different duration depending on the protocol, which would provide flexibility for the AS (as I tried to exemplify at the beginning of the discussion). However, if you think that it is preferable to have the possibility of fetching all Lvl1Keys at once, we can stick to one epoch duration for AS. I do not have a really strong opinion about what feature would apply to more use-cases, so I think that the discussion is useful in any case.

@mlegner
Copy link
Contributor

mlegner commented Jun 8, 2021

I do believe that the possibility of streamlining the 1st-level key exchanges would be worth sacrificing some flexibility, but that's not a hill I'd be willing to die on.

@mlegner
Copy link
Contributor

mlegner commented Jun 8, 2021

@matzf Following up on this:

[...] but designate a suitable protocol identifier (e.g. empty string, or 0) for use as the general-purpose protocol with the "classic hierarchy" derivation.

Where do we even have the protocol identifiers when we have separate SVs? And how are they encoded in general?

@matzf
Copy link
Contributor

matzf commented Jun 9, 2021

Where do we even have the protocol identifiers when we have separate SVs? And how are they encoded in general?

Good point. They only exist in the key requests then, so I guess they will best be represented as protobuf enums.

@shitz
Copy link
Contributor

shitz commented Jun 28, 2021

Hi all,

Thanks a lot @JordiSubira @matzf and @mlegner for fleshing out and discussing this proposal. Let me also add my 2 cents to various points raised.

Protocol-specific derivation
There is has been plenty of discussion about the upsides of having them and iirc they were always intended to exist from the very inception of DRKey, thus 👍 from my side.

Regarding the question about having only protocol-specific key hierarchies vs also having a general one, in principle, I lean towards having protocol-specific only. From an implementation perspective it's one case less to consider. I actually like @matzf's port number analogy - there are some well-defined ones and then there is a large range that can be used by any kind of application. The same would apply here. Now, the only difference is that applications need to rely on a third-party (the key server) to exchange the level 1 keys, which makes it slightly awkward if that third-party has to fetch keys for a protocol someone is prototyping (or has very niche use). The option of having a "generic"-protocol key hierarchy thus seems like a good compromise to me.

Host-specific delegation secret
Fully on-board with the proposed 4-level key hierarchy of @matzf 👍

Protocol-specific epoch lengths
This is a tricky one. I agree that different protocols/use-cases could benefit from different epoch lengths, however, it also adds complexity to the implementation. I'm not overly concerned with the messaging overhead - even if there are dozens of different protocols, all with different epoch lengths, the overhead of fetch the lvl1 keys should be negligible if we make sure there is a reasonable lower limit of allowable epoch length. In a degenerated case having an epoch length of 100ms (instead of ~hours) the impact would actually be much higher than having dozens of different protocols with "sane" epoch lengths.

I'd like to allow protocol-specific epoch lengths in the design. For the implementation, I'm fine with supporting a single one for now.

@JordiSubira
Copy link
Contributor Author

Thank you for the comment @shitz!

I agree on defining a lower bound for the epoch length to avoid unreasonable cases. This would also help to simplify the design/implementation of some other parts (e.g. key offset). I asked about this offline some time ago, but there was not any strong opinion on what this lower bound should be. Something was mentioned about aligning it to the minimum path lifetime ( 6 minutes) but I think this was kind of arbitrary. I would also lean towards something within the order of minutes. Do you have any opinion on that?

@shitz
Copy link
Contributor

shitz commented Jun 29, 2021

I think this should be in the order of hours. By default, I'd make it one day.

The NIST (link, page 46) recommends a lifetime of up to one year for derivation keys used with approved algorithms (e.g., AES, PKDF2 etc). Of course, this is very use case dependent, but I don't see a reason why we'd go too low here. It only adds overhead. A protocol that - for whatever reasons - requires ultra short-lived keys could always use an additional time component during key derivation.

Now that I think of it I wonder if we should just define the epoch length to be 1 day and have protocols that require shorter-lived keys have protocol-specific derivations that could take this into account.

Thoughts?

@JordiSubira
Copy link
Contributor Author

I have thought a little bit about this and there are several points to consider regarding the derivations keys in DRKey:

  • The key server might distribute SV_A{proto} with remote nodes (Lightning filter, BRs, etc). I assume not all of them are equally trustworthy. In addition, the more nodes this SV is distributed to, the higher the risk of leaking it.
  • There's no revocation mechanism for DRKey, i.e. once the KeyServer distributes a SV_A{proto} (or any lower key in the hierarchy) the protocol is "committed" to that key. The only parameter the AS can tweak to adapt to different use cases is the epoch duration.
  • We should also consider the prefetching and how many epochs in the future we allow requests/responses. For instance, if we allow requests for the next valid epoch i+1 and the epoch is one day, some remote host could use a given key K (derived from SV_A{i+1}.
  • If SV_A{proto} gets compromised, DRKey will not be able to recover, for that protocol, until the end of the next validation epoch (two days, in the previous example).

One might think that this is not such an issue, since we can always assume that upon compromise the AS should fall back to traditional defense mechanisms to filter traffic, if possible.

On the other hand, the derivation scheme uses the epoch_begin and epoch_end as part of the input to derive different SVs:

input = "len(master_secret) || master_secret || proto || epoch_begin || epoch_end"
SV_A^{proto} = KDF(input, salt)

Hence, I think we can define a unique protocol-specific derivation scheme, even if the epoch lengths vary. Then, depending on the use case the AS could configure different epoch duration per protocol using the same derivation scheme.

TL; TR; I find it difficult to foresee what epoch duration should be defined since it seems to be use case dependent. IMO, we could use a unique protocol-specific derivation scheme, even if we consider different epoch duration by design. However, I think setting this lower bound is still needed to avoid going unreasonably low. This lower bound may be different than the default duration (e.g. one day).

@matzf
Copy link
Contributor

matzf commented Jul 7, 2021

@JordiSubira a practical request: once this discussion is finalized, can you please update the DRKey documentation (doc/cryptography/DRKeyInfra.md) accordingly?
The existing document is still in markdown; it would be great if you could convert it to RST, and add anchors for the various terms such that they can be referenced in other documents (e.g. in the specification/documentation for authenticated SCMP, #3861).

@matzf
Copy link
Contributor

matzf commented Jul 7, 2021

  • We should also consider the prefetching and how many epochs in the future we allow requests/responses. For instance, if we allow requests for the next valid epoch i+1 and the epoch is one day, some remote host could use a given key K (derived from SV_A{i+1}.
  • If SV_A{proto} gets compromised, DRKey will not be able to recover, for that protocol, until the end of the next validation epoch (two days, in the previous example).

The allowed prefetch period doesn't need to directly related to the epoch length. For the suggested epoch duration of 1 day, an appropriate prefetch period could be on the order of 15 minutes; this would be ample for everybody to fetch the updated keys in time (even with failures/retries and allowing to distribute the updated derived keys internally).

A different but related point is the grace period. The current description in the docs sets this to a fixed 0.1 hours, 6 minutes. An application using DRKey might have to check a message using either of the keys to determine which one was used to create a message. Therefore, it might be desirable to keep this interval as short as possible.
I think the grace period only needs to be long enough to ensure that for use cases with single packet request/response, we can always use the same key for a reply. This would be on the order of a few seconds. Protocols with longer conversations will have to deal with key roll over anyway, so we might as well force them to always roll over immediately.

Btw. in the discussion on the packet authentication extension-header option (#4062), I've made the assumption that the grace period for DRKeys is such at any time there are at most two active key epochs (the "new" one and one that is about to expire). Does this make sense?

@JordiSubira
Copy link
Contributor Author

The allowed prefetch period doesn't need to directly related to the epoch length. For the suggested epoch duration of 1 day, an appropriate prefetch period could be on the order of 15 minutes; this would be ample for everybody to fetch the updated keys in time (even with failures/retries and allowing to distribute the updated derived keys internally).

Yes indeed, it does not need to relate to the epoch length, however, as you said, the prefetching period should allow for every remote AS to prefetch the key even in the presence of failures/retries. Taking 15 minutes as a fixed period, we might end up allowing to fetch more than one epoch ahead on the epoch length for the protocol (e.g. <15 minutes). I guess this shouldn't be a problem as long as we keep track of all keys that have been served (in case we want to rotate the master secret or change the epoch duration for subsequent epochs). That said, I am happy to configure an independent prefetching period, for which requests will be served (out of this period we can return an error).

Related to that, we might want to discuss if we support fetching past keys. It might be useful for some use case in which we need to authenticate past packets, however, it would add complexity in the roll-over and key management.

I think the grace period only needs to be long enough to ensure that for use cases with single packet request/response, we can always use the same key for a reply. This would be on the order of a few seconds. Protocols with longer conversations will have to deal with key roll over anyway, so we might as well force them to always roll over immediately.

I agree with that, I do not find any reason for which we should make it longer than the case you mentioned.

Btw. in the discussion on the packet authentication extension-header option (#4062), I've made the assumption that the grace period for DRKeys is such at any time there are at most two active key epochs (the "new" one and one that is about to expire). Does this make sense?

I would say this relates to what I mentioned about use cases having to validate past packets. I am not completely sure if in the use case you mentioned it is needed to authenticate past packets. If that is not the case, then I would say that it makes sense to only consider the two active epochs.

@JordiSubira
Copy link
Contributor Author

JordiSubira commented Jul 13, 2021

I will try to summarize the discussion so far so that you can follow up on that. If everyone is happy with this, I will update the DRKey documentation afterward.


DRKey derivation scheme

We define two types of derivation: the protocol-specific derivation and the generic-protocol derivation. Both of them leverage the 4-level derivation scheme.

4-level derivation scheme

Identifier Derivation Name Description
SV_A^{proto} Level 0 key, secret value
K_{A->B} PRF_{SV_A^{proto}}(B) Level 1 key, AS-to-AS Derived in A, shared between key servers.
K_{A->B:H_B} PRF_{K_{A->B}}(H_B) Level 2 "to" key, AS-to-host Derived in each key server. Only available to B:H_B and trusted infrastructure.
K_{A:H_A->B} PRF_{K_{A->B}}(H_A) Level 2 "from" key, host-to-AS Derived in each key server. Only available to A:H_A and trusted infrastructure. Typically only used as intermediate step in derivation, not for communication.
K_{A:H_A->B:H_B} PRF_{K_{A:H_A->B}}(H_B) Level 3 key, host-to-host Derived in H_A, derived in key server B for H_B. Available to A:H_A and B:H_B.

proto can be defined as a fixed-size value, e.g. a 2-byte identifier.

The PRF derivation for every key includes the "type" ("AS-to-AS", ""AS-to-host", "host-to-AS" and "host-to-host"). This is useful to avoid deriving the same values for intended different keys. For instance it outputs K_{A:H_A->B} != K_{A->B:H_B} when H_A==H_B.

Protocol-specific derivation

                            SV_A{proto}          (0th level)
                            | 
                            |
                        K_{AB}^{proto}           (1st level)
                            | 
                    +------------------------------------+
                    |                                   |
                K_{A:H_A->B}^{proto}           K_{A->B:H_B}^{proto}         (2nd level)
                    |   
                K_{A:H_A->B:H_B}^{proto}                                (3rd level) 

As aforementioned, proto can be defined as a fixed-size value, e.g. a 2-byte identifier.

The 0th level key would be derived as follows (please, speak up if you think this might be buggy):

input = "len(master_secret) || master_secret || proto || epoch_begin || epoch_end"
SV_A^{proto} = KDF(input)

Generic-protocol derivation

                            SV_A          (0th level)
                            | 
                            |
                        K_{AB}           (1st level)
                            | 
                    +-----------------------------------------+
                    |                                                   |
                K_{A:H_A->B}^{protocol}           K_{A->B:H_B}^{protocol}         (2nd level)
                    |   
                K_{A:H_A->B:H_B}^{protocol}                                (3rd level) 

The generic-protocol derivation can be thought of as a special case of the protocol-specific derivation for the 0th and 1st level keys. For instance, using a special proto value in the protocol-specific scheme, e.g. 0 or "".

This derivation scheme allows applications to define "niche" protocols. protocol is a (variable or fixed size) value identifying this "niche" protocol.

Epoch lengths

In the design, every AS can define different epoch lengths for each protocol-specific 0th level key. In the implementation, a single epoch for every 0th level key issued by the "issuer"-AS is acceptable for now.

We should define a reasonable lower bound for the epoch length used in DRKey to avoid nonsensical scenarios.

Grace period

We define a short overlapping period in which we accept authenticated packets with the key for the previous epoch i-1 and also for the current one i. This period should be ideally as short as possible, although long enough to allow using the same key for single packet request/response use cases (e.g. a few seconds).

Valid epochs & prefetching

Related to the previous part, we should define what epochs are valid. More specifically:

  • How many epochs/time ahead keys can be requested?
  • Whether or not we accept requests for past keys, to validate old packets.

For the former, I like what suggested by @matzf, i.e. allowing a globally fixed time (e.g. 15 minutes) to fetch keys beforehand.

@mlegner
Copy link
Contributor

mlegner commented Jul 15, 2021

@JordiSubira: Thanks for summarizing this.

One question: Why do we need the salt in the KDF?

@JordiSubira
Copy link
Contributor Author

One question: Why do we need the salt in the KDF?

That's a good observation and I think in this case it is not strictly necessary. However there's one aspect to consider, in the current implementation we are using the same master_secret as the one used for the forwarding secret. IMO, it is cleaner having two independent master secrets. However, if we consider the same secret, the same output might be computed (which does not look ideal for different use cases). It is true that since inputs are concatenated with different values it is highly unlikely that this happens.

I will simplify it and leave out the salt from the design.

@JordiSubira
Copy link
Contributor Author

I have a question regarding the encoding of the protocol for the Lvl2/3 request. I am not completely sure how we should define the protobuf Lvl2/3 request so that hosts can ask for both protocol-specific keys (using a value of the protobuf enum) and also, generic-protocol keys with any "niche" protocol. Should we encode the "niche" protocol as an optional string/bytes or do you have a better idea?

Maybe @matzf or @mlegner you are interested in commenting about this.

@mlegner
Copy link
Contributor

mlegner commented Jul 15, 2021

I am not completely sure how we should define the protobuf Lvl2/3 request so that hosts can ask for both protocol-specific keys (using a value of the protobuf enum) and also, generic-protocol keys with any "niche" protocol. Should we encode the "niche" protocol as an optional string/bytes or do you have a better idea?

For the "niche" protocols, we have a special ProtocolID of (let's say) 0, right. Can we add a field ProtocolString, which must be present if and only if ProtocolID is 0? Maybe we could even make this fixed-length (let's say 8 bytes) to simplify key derivation and still support up to 8 ASCII characters.

@matzf
Copy link
Contributor

matzf commented Jul 16, 2021

I am not completely sure how we should define the protobuf Lvl2/3 request so that hosts can ask for both protocol-specific keys (using a value of the protobuf enum) and also, generic-protocol keys with any "niche" protocol. Should we encode the "niche" protocol as an optional string/bytes or do you have a better idea?

For the "niche" protocols, we have a special ProtocolID of (let's say) 0, right. Can we add a field ProtocolString, which must be present if and only if ProtocolID is 0? Maybe we could even make this fixed-length (let's say 8 bytes) to simplify key derivation and still support up to 8 ASCII characters.

Couldn't we make this transparent for the Level 2 / 3 key requests instead? As long as the key servers agree on how to derive the key, it shouldn't matter for the requester whether it was derived from the generic or protocol specific hierarchy.
This could be e.g. based on a global registration for protocol identifiers (e.g. use the generic hierarchy for protocol numbers > 42), or could in principle even be configurable per AS/key server. I imagine something like this: during level 1 key exchange, a key server tells its peers the level 1 keys for each supported protocol identifier. From this, the peer key server immediately knows that for protocols for which no key was shared, it will have to resort to the generic hierarchy.

This would allow evolve protocols from the "niche" to first class support, without having to adapt most of the existing clients. Only the instances that want/need to use the protocol SV directly would need to be touched during this evolution, and a setup cost for these, to allow them access to the SV, will be unavoidable anyway. Notably, the clients on the slow side do not have to be changed.

Furthermore, this would allow us to defer the implementation of the generic hierarchy for now without having to worry about extending the key request protocol later on or having secondary protocol identifiers.

@JordiSubira
Copy link
Contributor Author

Thank you @mathz for the interesting comment.

I also think that using this port analogy (i.e. having the protocol-specific protocols defined in some list and using the rest of the numbers for "niche" protocols) can work as well. However, I am not completely convinced it is in practice transparent for the Lvl2/Lvl3 key requester. If we use a 2-byte value, for example, I guess H_A (requester) and H_B (responder) must agree on providing some semantics to a certain protocol number (e.g. they use protocol_id=500 for protocol X). If this protocol is then upgraded to protocol-specific it will be likely assigned a different protocol number (e.g. 25). Assuming that H_B only wants to use the protocol-specific derivation, H_A should change the protocol_id in the request anyway, is that right?

I agree that it would be only required that KS_A and KS_B agree on how to derive the keys. If we assume a global list this is trivial, but the configurable case seems to add more complexity. Why should one AS configure this list differently? (aside from not having updated the protocol-specific list for example). I thought the idea of having protocol-specific derivation was only having a handful of them defined globally in the whole protocol.

@matzf
Copy link
Contributor

matzf commented Jul 23, 2021

If this protocol is then upgraded to protocol-specific it will be likely assigned a different protocol number (...)

Why should one AS configure this list differently?

The protocol specific hierarchies are only useful if the AS actually makes use of it by providing services that have direct access to SV^{proto}. If it doesn't make use of this, using the generic hierarchy is completely equivalent.
This would provide a natural upgrade path for protocols from "niche" to "established"; as soon as an AS starts running services with access to SV^{proto}, it unilaterally configures this for the protocol specific derivation.
Note that this does not require to change the protocol number.

I don't think that this would add very much complexity. If we structure the Level1 key requests such that the full list of explicitly supported protocol identifiers is directly available (i.e. we do not request Level1 keys for each protocol number separately), supporting this configurability does not change much; it turns an error case (not all expected protocols are included, fail) to a fallback (protocol is not in the list, fall back to generic hierarchy).

Either way, perhaps I put to much emphasis on this "upgrade path" point. The other relevant point is to have a simple "flat" identifier for keys; making the choice for generic or protocol specific hierarchy "transparent" in the Level2/3 key means that keys can be identified by only the protocol number and nothing else. If we include an additional flag or even string identifier here, this will need to be replicated anywhere where DRKey keys need to be identified (wire protocols, configuration files, code, ...). This is relevant in the context of the Packet Authentication Option (see #4062 (comment)).

@mlegner
Copy link
Contributor

mlegner commented Jul 27, 2021

The other relevant point is to have a simple "flat" identifier for keys; making the choice for generic or protocol specific hierarchy "transparent" in the Level2/3 key means that keys can be identified by only the protocol number and nothing else. If we include an additional flag or even string identifier here, this will need to be replicated anywhere where DRKey keys need to be identified (wire protocols, configuration files, code, ...). This is relevant in the context of the Packet Authentication Option (see #4062 (comment)).

Yes, that's a very good point. Being able to identify Level2/Level3 keys without having to know the precise derivation would be really useful for SPAO.

Another point: I noticed that I have been writing "Level2/Level3" or similar quite often. Maybe we should introduce a name for Level2 "to" keys and Level3 keys, which are the only ones that are actually used for authenticating messages, right?

@JordiSubira
Copy link
Contributor Author

I also agree that having this "flat" identifier simplifies the requests, so I will define the Lvl2/Lvl3 requests accordingly. However, I don't see completely how much more neat the "upgrade path" is, but this is something we can discuss further in the future.

Related to that, I was considering "unary" Lvl1 requests/responses in the new design documentation and the implementation (instead of the "n-ary" responses you mentioned). I think we can stick to the former at the moment and also revisiting this in the future, what do you think?

Maybe we should introduce a name for Level2 "to" keys and Level3 keys, which are the only ones that are actually used for authenticating messages, right?

In the discussion, I was also referring to the Lvl2 "from" key, but if it is useful we can also give some name to these "final" keys.

@matzf
Copy link
Contributor

matzf commented Aug 10, 2021

Another point: I noticed that I have been writing "Level2/Level3" or similar quite often. Maybe we should introduce a name for Level2 "to" keys and Level3 keys, which are the only ones that are actually used for authenticating messages, right?

+1

Maybe we could use Communication Keys for the ""final" keys and Intermediate Keys for the, well, intermediate keys in the derivation hierarchy.

Btw., for usages of DRKey, it would also be helpful to define some standard terminology both for the "type" (AS-to-Host vs. Host-to-Host communication key), and for the "directionality" (which side is fast / slow side) and such that we can refer to the definitions of these terms wherever we might need them.

@fstreun
Copy link
Contributor

fstreun commented Aug 30, 2021

I've just come across this issue/PR and am mainly looking into how it affects possible Lightning Filter deployments.
Further, I am also interested in the usage of DRKey outside SCION.

I think naming the keys Communication Keys and Intermediate Keys would make it much easier to understand their usage.
Speaking of usage: why would someone use the AS-host key (Level 2), i.e., K_{A,B:H_B}, for communication?

@JordiSubira
Copy link
Contributor Author

JordiSubira commented Aug 30, 2021

why would someone use the AS-host key (Level 2), i.e., K_{A,B:H_B}, for communication?

AS-host keys are intended to be used for communication between infrastructure nodes (which need rapid derivation) and end-hosts. One example are routers authenticating SCMP error messages, which is discussed above in this thread.

@fstreun
Copy link
Contributor

fstreun commented Aug 31, 2021

So to summarize:

In a setup using AS-host keys, the sender, i.e., the slow side, requires only one key per AS and not per host.
This allows for better caching of the keys and reduces requests sent to the certificate server.
This improves performance on the slow side.
However, since all receivers on the fast side require the secret value for the fast key derivation, they must be trustworthy.

Host-host keys seem to be mainly useful if the receiver side (i.e., the fast side) is not completely trustworthy. By sharing only intermediate keys instead of the secret value with a receiver, the receiver can generate keys for itself on the fast side but not for others. Since the intermediate keys are different for each sender's AS, the receiver has to perform an additional lookup for the key generation, though.

Are there other good reasons to use the one or the other?

@JordiSubira
Copy link
Contributor Author

Firstly, I would leave out sender and receiver from the discussion (if you are referring to the communication direction), since this is specific for every use case, meaning every protocol using DRKey. I will only use fast side and slow side.

In a setup using AS-host keys, the sender, i.e., the slow side, requires only one key per AS and not per host.
This allows for better caching of the keys and reduces requests sent to the certificate server.
This improves performance on the slow side.
However, since all receivers on the fast side require the secret value for the fast key derivation, they must be trustworthy.

I would add here that routers or infra nodes (on the fast side) are not affected by the extra-derivation step (i.e. lvl1Key -> host-AS -> host-host).

Since the intermediate keys are different for each sender's AS, the receiver has to perform an additional lookup for the key generation, though.

Actually, there's no additional lookup for the host-host key. When the end-host (on the slow side) requests a host-host key to the CS, the CS carries out the extra derivation step but this is transparent to the host.

@fstreun
Copy link
Contributor

fstreun commented Aug 31, 2021

I would add here that routers or infra nodes (on the fast side) are not affected by the extra-derivation step (i.e. lvl1Key -> host-AS -> host-host).

Yes, avoiding one derivation step on the fast side might also be a good reason.

Actually, there's no additional lookup for the host-host key. When the end-host (on the slow side) requests a host-host key to the CS, the CS carries out the extra derivation step but this is transparent to the host.

I was referring to the (untrusted) receiver side, i.e., the fast side, which has only access to the host-AS keys (K_{A:H_1,B}, K_{A:H_1,C}, etc.).
When receiving a packet, the host on the fast side seems to be required to perform a lookup to obtain the intermediate key for the particular AS.

@JordiSubira
Copy link
Contributor Author

When receiving a packet, the host on the fast side seems to be required to perform a lookup to obtain the intermediate key for the particular AS.

If you refer to A:H_1, then there's no additional lookup either. For example, any (or several) B:H_* sends a message to A:H_1, upon receiving the message A:H_1 looks K_{A:H_1,B} up, then it derives K_{A:H_1,B:H_*}. Without the intermediate key (host-AS key) A:H_1 would straightly look up for the host-host key (e.g. K_{A:H_1,B:H_2}). As you can see, there's an additional derivation step in exchange of fewer lookups (for the same slow-side AS).

@fstreun
Copy link
Contributor

fstreun commented Aug 31, 2021

If you refer to A:H_1, then there's no additional lookup either. For example, any (or several) B:H_* sends a message to A:H_1, upon receiving the message A:H_1 looks K_{A:H_1,B} up, then it derives K_{A:H_1,B:H_*}.

This is the lookup, which is not required if A:H_1 is trustworthy and would be in possession of SV_A.

It seems we were talking about the same.

lukedirtwalker pushed a commit to lukedirtwalker/scion that referenced this issue Nov 18, 2021
This PR contains the updated documentation for DRKey based on the discussion scionproto#4039.

[doc]

Closes scionproto#4102

GitOrigin-RevId: 18d2a8d7beb87b0ace36ade0424d830ff265dd01
@matzf matzf closed this as completed May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants