Scalability of path discovery #42

matzf · 2024-06-21T13:23:58Z

Scalability of path discovery.

Explain quality/quantity vs resource overhead
Resource cost in terms of number and length of discovered paths
Exploration time in terms of path length / network diameter
Separate analysis for inter/intra ISD beaconing:
- Typical / expected properties of the network
- Example numbers to give impression for order of magnitude of overhead

Fixes #8

jiceatscion

I won't make pronouncements on the math accuracy... I find it convincing enough. Regarding the breadth and depth, I think it's good. I imaging that this is what the reviewer was asking for.

Possibly, we could add a summary, with a few key scaling estimates in O() form. Where e.g. PCB received per second: O(N^2) - Although, that'll be concerning in the mind of your average reviewer, btw. So may be no need to rub it in.

draft-dekater-scion-controlplane.md

matzf · 2024-07-03T13:28:50Z

Thanks for fixing my typos, @nicorusti.

nicorusti

Thanks a lot for putting this together! Most of my comments are small language stuff, besides that I feel that some of the rough calculations could be framed slightly different

draft-dekater-scion-controlplane.md

nicorusti · 2024-07-04T08:35:03Z

draft-dekater-scion-controlplane.md

+With N the number of participating core ASes, an AS receives up to 5 * N PCBs per propagation interval per core link interface.
+For highly connected ASes, the number of PCBs received thus becomes rather large. In a network of 1000 ASes, a highly connected AS with 300 core links receives up to 1.5 million PCBs per propagation interval.
+Assuming an average PCB length of 6 and the shortest propagation interval of 60 seconds, this corresponds to roughly 150 thousand signature validations per second. This throughput can be achieved on a single core of a present day small server or desktop machine.
+In terms of bandwidth, this corresponds to very roughly 38MB/s.


Maybe here we could summarize by saying that the overall message complexity for an AS is linear to the number of core ASes N.

But it's not, it's N times the path length. That's the whole buildup of this section:

[Resource costs] depend on the the number and length of the discovered path segments, that is, on the total number of AS entries of the discovered path segments.

Then we say that in core network, PCBs are roughly log(N) long.

With N the number of participating core ASes, an AS receives up to 5 * N PCBs per propagation interval per core link interface.

Thanks for the clarification, as far as I understand then the message complexity in terms of number of signature validations per AS can be approximated with O(N*log(N)), while the amount of propagated PCBs per AS is O(N), correct?
If you agree, I still think it might be more understandable to directly mention it

…g Candidate PCBs

draft-dekater-scion-controlplane.md

Co-authored-by: Nicola Rustignoli <[email protected]>

tzaeschke · 2024-07-05T15:39:07Z

General Note:

The notes contain several references to "immediate cold-start PCB forwarding" which in turn refers to the section #selection where it says
under "Storing and Selecting Candidate PCBs":
"Note that during bootstrapping and if the AS obtains a PCB containing a previously unknown path, the AS SHOULD forward the PCB immediately, [...]."
The notes contain several references to "beacon origination interval", i.e. the intervall at which new beacons are created. I couldn't find any description
of this interval, maybe I overlooked it? Is it the same as the propagation interval?

Some points I found could be useful to add to the doc:

I couldn't find a discussion of beacon origination interval (or the RegistrationInterval), see "DefaultOriginationInterval" (or the DefaultRegistrationInterval) in the code.
Maybe I overlooked it?
Load balancing? E.g. everybody picks the shortest paths, all other paths remain unused...? How is this handled?

Introduction

Avoiding Circular Dependencies and Partitioning

Does this section (title) make sense as it is? The two topics (circular dependencies and partitioning) appear unrelated. Partitioning is discussed in a separate subsection
Also, this section claims to contain a list that explains how circular dependencies are avoided.
However, I am not sure how anything in the list explains anything about circular dependencies?

Partition and Healing

ASes could always switch to otherwise unused links.
- What are unused links? Unused indicates that they are not in the "Best PCB" set.
  If that is the case, then we cannot simply "switch" to use them, we first need to have them discovered by beacons. This takes time, see propagation interval (unless PCB are forwarded immediately) and beacon origination interval.
- Also: Does "Healing" include "adding new links"? In this case we also need to wait for the propagation interval and the origination interval.
  As I understand, with propagation interval being set to e.g. 10Minutes, this adds up to 5hops1/210min= 25minutes for 5 hops.

Path Exploration or Beaconing {#beaconing}

Introduction and Overview

Intra-ISD beaconing: Isn't this incomplete? How are DOWN segments created? I think leaf ASes need to propagate paths back to the COREs? -> Link to #intra-reg ?
Inter-ISD beaconing: Similarily, I think there is a step where CORE beacons share their path DB with other CORE ASes in the same ISD?

Extending a PCB

selects the best combinations: Maybe link to a section that explains how this works? And what "best" means?

Path-Segment Construction Beacons

PCB Validity

For the purpose of validation, a timestamp is considered "future" if it is later than the current time at the point of validation plus the minimum expiration time of a hop field (337.5 seconds, see ).
Maybe add an explanation why we add the minimum expiration time here?
Shouldn't "future" simply be timestamp + some_delta_to_account_for_server_time_inaccuracies, where the delta is maybe a few seconds rather than 5.5 minutes?

Propagation of PCBs {#path-prop}

Selection of PCBs to Propagate {#selection}

Storing and Selecting Candidate PCBs

temporary storage: Maybe clarify what "temporary" means.
- How long are PCBs stored?
- Under what circumstances are they removed, e.g. when they expire?
- Or possibly replaced with a new version? What if the new version has an earlier expiration date?
At each propagation event, each AS selects a set of the best PCBs from the candidates in the beacon store
The best PCBs set size SHOULD be at most "50" (PCBs) for intra-ISD beaconing and at most "5" (PCBs) for core beaconing.
- I found this a bit confusing: the "50" appears to be the total number of beacons forwarded for non-core ASes, whereas the "5" refers to the number of
  PCBs per remote CORE AS. Maybe clarify this?
Note that during bootstrapping and if the AS obtains a PCB containing a previously unknown path, the AS SHOULD forward the PCB immediately, [...].
- Is this true? It appears to conflict with many other parts of the document that talk about propagation intervals in the contect of cold-start.
- Is there a difference between "bootstrapping" (used here) and "cold-start" (used in other places)? Maybe stick to one term or explicitly declare equality?
- Is this subject to the "Best 5 PCB" rule? -> If a new path is immediately forwarded, does it count towards th "best 5"? If not, then we are effectively
  forwarding >5 paths, correct? If yes, then the first 5 paths are always the best until they expire and can be replaced with other paths?
- What means "unknown path"? Does it refer to all links in the segment or just the remote AS? If it is all links in the segment, then there may be many new paths
  coming in all the time that need to be forwarded immediately, or not?

Effects of Clock Inaccuracy

PCBs are propagated at a configurable interval (typically, one minute).
- Unless they are new, then the interval is ignored, see "immediate cold-start PCB forwarding".
- Maybe rephrase: "(typically, one minute)" ---> "immediately for new beacons, minimum 5secs for intra-ISD, minimum one minute for inter-ISD"?
  See #path-prop: "The propagation interval SHOULD be at least "5" (seconds) for intra-ISD beaconing and at least "60" (seconds) for core beaconing.".
- Also, AFAIK, the current configured real-world interval is more like 10-15 minutes...?
PCBs with N hops may be validated up to N intervals (so typically N minutes) after origination
- I think the word "typically" is misleading here, it can be understood as "PCBs [...] are validated typically after N minutes", whereas it actually means that the maximum is typically N minutes.
  Rephrase to "(maximally N minutes)" or "(amounting to N minutes, assuming the minimum inter-ISD propagation interval)"
Rephrase The norm is 6 hours. to ... SHOULD be 6 hours ? What does 'norm' mean?
In comparison to these time scales, clock offsets in the order of minutes are immaterial.
- This relates only to the previous paragraph about certificates; I guess it should be attached to the previous paragraph?

Path Discovery Time and Scalability {#scalability}

~~balances _OF_ the number of discovered paths -> remove OF~~ Already fixed
Generally, the time until a specific PCB is built depends on its length and the propagation interval.?
- I think in the context of "cold boot¨, the propagation delay is "0", see "immediate cold-start PCB forwarding".
PCB arrives at a random point in time during the interval and is buffered before potentially being propagated
- see "immediate cold-start PCB forwarding"
- Also, I think the calculation needs to take into account the "beacon origination interval".
As will become apparent, the inter-ISD beaconing results in excessive overhead with very large numbers of participating core ASes.
- Does this need to be in the IETF spec?
The ideal topology for SCION is to keep the inter-ISD core network to a moderate size, to benefit from the divide-and-conquer partitioning of ASes into ISDs and the efficiency of the intra-ISD beaconing.
- What is done to ensure this? What happens if the size is not moderate? What is "moderate"?

Intra-ISD Beaconing

Otherwise, child ASes at distance D below the new link, learn of the new link after D further propagation steps
- New path: "immediate cold-start PCB forwarding"

Inter-ISD Beaconing

On a cold start of the network, [...]. With a 5 second propagation period [...]
- Above it says that bootstrapping results in immediatede forwarding, see "immediate cold-start PCB forwarding".
When a new link is added to the network, it will be available to connect two ASes at distances from the link D1 and D2 from the link, respectively, after a mean time (D1+D2)*T/2.
- Typo: duplicated "from the link";
- Also: see previous point about "immediate cold-start PCB forwarding"

nicorusti · 2024-07-07T21:02:39Z

Thank you for your feedback @tzaeschke ! I respond here regarding the scalability and clock inaccuracy sections. For other sections, and for points that we don't have time to address in time this revision, I opened separate issues:

Regarding Effects of Clock Inaccuracy

I think the word "typically" is misleading here, it can be understood as "PCBs [...] are validated typically after N minutes", whereas it actually means that the maximum is typically N minutes.
Rephrase to "(maximally N minutes)"

Done, maximally N minutes sounds good.

Also, AFAIK, the current configured real-world interval is more like 10-15 minutes...?

@matzf I is it 1 min as in the draft, or 10-15? 10-15 feels a bit high to me

Rephrase The norm is 6 hours. to ... SHOULD be 6 hours ? What does 'norm' mean?

I am a bit reluctant to use RFC2119 language (uppercase SHOULD) for exactly 6 hours. This is a value that overall depends on the maximum AS path expected in the network, and it might as well be a different value. I therefore rephrased like this:
For this reason, it is unadvisable to create hops with a short expiration time, that should be around 6 hours.

In comparison to these time scales, clock offsets in the order of minutes are immaterial.
This relates only to the previous paragraph about certificates; I guess it should be attached to the previous paragraph?

Done.

Regarding Path Discovery Time and Scalability {#scalability}

The notes contain several references to "immediate cold-start PCB forwarding"

@matzf clarified here that this is not the case, I removed that note, this should also solve some many of the consistency issues you reported.

Also, I think the calculation needs to take into account the "beacon origination interval".

To be handled in #45

As will become apparent, the inter-ISD beaconing results in excessive overhead with very large numbers of participating core ASes.
Does this need to be in the IETF spec?

Good point, I rephrased this section to:
To achieve scalability in its routing process, SCION uses a divide-and-conquer approach, partitioning ASes into ISDs. In order to benefit from this, an ideal topology SCION should keep the inter-ISD core network to a moderate size. For more specific observations, we distinguish between intra- and inter-ISD beaconing.

What is done to ensure this? What happens if the size is not moderate? What is "moderate"?

We give some numbers in the Inter-ISD Beaconing section with an example with 1000 core ASes, this gives a rough figure. The bandwidth and computation overhead figures there should also give a rough hint of what happens if the network grows too much: the overhead becomes considerable.
What is done to ensure this IMHO depends on how the network is deployed, I think this topic would be a better fit to be discussed in the new deployment Internet Draft, I opened an issue there: scionassociation/scion-deployment_I-D#1

Typo: duplicated "from the link";

Fixed

matzf · 2024-07-08T14:58:51Z

Commented on the related issues for the other sections.

Also, AFAIK, the current configured real-world interval is more like 10-15 minutes...?

@matzf I is it 1 min as in the draft, or 10-15? 10-15 feels a bit high to me

The 1 minute value seems realistic. SCIONLab uses 5 seconds for non-core beaconing and 1 minute for core beaconing. Anapaya's infrastructure reportedly runs with 30s.

matzf requested review from nicorusti and jiceatscion June 21, 2024 13:24

Scalability of path discovery

4c7fc0d

matzf force-pushed the scalability branch from 27428e7 to 4c7fc0d Compare June 24, 2024 08:05

jiceatscion reviewed Jun 24, 2024

View reviewed changes

matzf added 2 commits June 24, 2024 17:20

scalability analysis polish

760c0e2

scalability: use different numbers

17ad81b

matzf marked this pull request as ready for review June 25, 2024 06:59

jiceatscion approved these changes Jun 25, 2024

View reviewed changes

matzf and others added 2 commits July 1, 2024 08:28

clarify limit of no PCBs per link in example calculation

3d16011

typos

011ef7c

nicorusti requested changes Jul 3, 2024

View reviewed changes

nicorusti reviewed Jul 4, 2024

View reviewed changes

add reference to scalability section in 2.3.1.1. Storing and Selectin…

5f7a89e

…g Candidate PCBs

jiceatscion approved these changes Jul 4, 2024

View reviewed changes

draft-dekater-scion-controlplane.md Outdated Show resolved Hide resolved

matzf and others added 8 commits July 5, 2024 12:45

Update draft-dekater-scion-controlplane.md

aa890b1

Co-authored-by: Nicola Rustignoli <[email protected]>

Update draft-dekater-scion-controlplane.md

8e7ca52

Co-authored-by: Nicola Rustignoli <[email protected]>

fixup down path -> down-path segment and phrasing

1a6084f

Update draft-dekater-scion-controlplane.md

0c05132

Co-authored-by: Nicola Rustignoli <[email protected]>

Update draft-dekater-scion-controlplane.md

7bb67e3

Co-authored-by: Nicola Rustignoli <[email protected]>

refer to *recommended* best PCBs set size 50

8292805

scaling mention distributed CS

d8bff0f

split unrelated lines

c6f8e1f

Remove note about immediate PCB propagation, fix typos

966445b

This was referenced Jul 7, 2024

Discuss managing the number of ISDs and core ASes scionassociation/scion-deployment_I-D#1

Open

Clarify origination v.s. propagation interval #45

Closed

Clarifications in introduction and beaconing #46

Open

nicorusti added 2 commits July 7, 2024 23:03

scalability: implement feedback Tilman

2b2b768

fix lint

4e2e591

nicorusti approved these changes Jul 7, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into scalability

60ec8c6

mention propagation time is around one minute

8b8d3dd

nicorusti merged commit 70f77f0 into main Jul 8, 2024
2 checks passed

nicorusti mentioned this pull request Jul 8, 2024

Clarify beaconing fast retry at bootstrapping #48

Closed

nicorusti deleted the scalability branch July 8, 2024 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalability of path discovery #42

Scalability of path discovery #42

matzf commented Jun 21, 2024 •

edited

Loading

jiceatscion left a comment

matzf commented Jul 3, 2024

nicorusti left a comment

nicorusti Jul 4, 2024

matzf Jul 5, 2024

nicorusti Jul 7, 2024 •

edited

Loading

tzaeschke commented Jul 5, 2024

nicorusti commented Jul 7, 2024 •

edited

Loading

matzf commented Jul 8, 2024

Scalability of path discovery #42

Scalability of path discovery #42

Conversation

matzf commented Jun 21, 2024 • edited Loading

jiceatscion left a comment

Choose a reason for hiding this comment

matzf commented Jul 3, 2024

nicorusti left a comment

Choose a reason for hiding this comment

nicorusti Jul 4, 2024

Choose a reason for hiding this comment

matzf Jul 5, 2024

Choose a reason for hiding this comment

nicorusti Jul 7, 2024 • edited Loading

Choose a reason for hiding this comment

tzaeschke commented Jul 5, 2024

Some points I found could be useful to add to the doc:

Introduction

Avoiding Circular Dependencies and Partitioning

Partition and Healing

Path Exploration or Beaconing {#beaconing}

Introduction and Overview

Extending a PCB

Path-Segment Construction Beacons

PCB Validity

Propagation of PCBs {#path-prop}

Selection of PCBs to Propagate {#selection}

Storing and Selecting Candidate PCBs

Effects of Clock Inaccuracy

Path Discovery Time and Scalability {#scalability}

Intra-ISD Beaconing

Inter-ISD Beaconing

nicorusti commented Jul 7, 2024 • edited Loading

Regarding Effects of Clock Inaccuracy

Regarding Path Discovery Time and Scalability {#scalability}

matzf commented Jul 8, 2024

matzf commented Jun 21, 2024 •

edited

Loading

nicorusti Jul 7, 2024 •

edited

Loading

nicorusti commented Jul 7, 2024 •

edited

Loading