Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalability of path discovery #42

Merged
merged 19 commits into from
Jul 8, 2024
Merged

Scalability of path discovery #42

merged 19 commits into from
Jul 8, 2024

Conversation

matzf
Copy link
Contributor

@matzf matzf commented Jun 21, 2024

Scalability of path discovery.

  • Explain quality/quantity vs resource overhead
  • Resource cost in terms of number and length of discovered paths
  • Exploration time in terms of path length / network diameter
  • Separate analysis for inter/intra ISD beaconing:
    • Typical / expected properties of the network
    • Example numbers to give impression for order of magnitude of overhead

Fixes #8

@matzf matzf requested review from nicorusti and jiceatscion June 21, 2024 13:24
Copy link
Contributor

@jiceatscion jiceatscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't make pronouncements on the math accuracy... I find it convincing enough. Regarding the breadth and depth, I think it's good. I imaging that this is what the reviewer was asking for.

Possibly, we could add a summary, with a few key scaling estimates in O() form. Where e.g. PCB received per second: O(N^2) - Although, that'll be concerning in the mind of your average reviewer, btw. So may be no need to rub it in.

draft-dekater-scion-controlplane.md Outdated Show resolved Hide resolved
draft-dekater-scion-controlplane.md Outdated Show resolved Hide resolved
draft-dekater-scion-controlplane.md Outdated Show resolved Hide resolved
draft-dekater-scion-controlplane.md Outdated Show resolved Hide resolved
draft-dekater-scion-controlplane.md Outdated Show resolved Hide resolved
draft-dekater-scion-controlplane.md Outdated Show resolved Hide resolved
@matzf matzf marked this pull request as ready for review June 25, 2024 06:59
@matzf
Copy link
Contributor Author

matzf commented Jul 3, 2024

Thanks for fixing my typos, @nicorusti.

Copy link
Member

@nicorusti nicorusti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for putting this together! Most of my comments are small language stuff, besides that I feel that some of the rough calculations could be framed slightly different

draft-dekater-scion-controlplane.md Outdated Show resolved Hide resolved
draft-dekater-scion-controlplane.md Outdated Show resolved Hide resolved
draft-dekater-scion-controlplane.md Outdated Show resolved Hide resolved
draft-dekater-scion-controlplane.md Outdated Show resolved Hide resolved
draft-dekater-scion-controlplane.md Show resolved Hide resolved
draft-dekater-scion-controlplane.md Outdated Show resolved Hide resolved
draft-dekater-scion-controlplane.md Show resolved Hide resolved
draft-dekater-scion-controlplane.md Show resolved Hide resolved
draft-dekater-scion-controlplane.md Outdated Show resolved Hide resolved
draft-dekater-scion-controlplane.md Outdated Show resolved Hide resolved
Comment on lines 1353 to 1356
With N the number of participating core ASes, an AS receives up to 5 * N PCBs per propagation interval per core link interface.
For highly connected ASes, the number of PCBs received thus becomes rather large. In a network of 1000 ASes, a highly connected AS with 300 core links receives up to 1.5 million PCBs per propagation interval.
Assuming an average PCB length of 6 and the shortest propagation interval of 60 seconds, this corresponds to roughly 150 thousand signature validations per second. This throughput can be achieved on a single core of a present day small server or desktop machine.
In terms of bandwidth, this corresponds to very roughly 38MB/s.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe here we could summarize by saying that the overall message complexity for an AS is linear to the number of core ASes N.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's not, it's N times the path length. That's the whole buildup of this section:

  • [Resource costs] depend on the the number and length of the discovered path segments, that is, on the total number of AS entries of the discovered path segments.

  • Then we say that in core network, PCBs are roughly log(N) long.

  • With N the number of participating core ASes, an AS receives up to 5 * N PCBs per propagation interval per core link interface.

Copy link
Member

@nicorusti nicorusti Jul 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, as far as I understand then the message complexity in terms of number of signature validations per AS can be approximated with O(N*log(N)), while the amount of propagated PCBs per AS is O(N), correct?
If you agree, I still think it might be more understandable to directly mention it

draft-dekater-scion-controlplane.md Outdated Show resolved Hide resolved
@tzaeschke
Copy link

General Note:

  • The notes contain several references to "immediate cold-start PCB forwarding" which in turn refers to the section #selection where it says
    under "Storing and Selecting Candidate PCBs":
    "Note that during bootstrapping and if the AS obtains a PCB containing a previously unknown path, the AS SHOULD forward the PCB immediately, [...]."
  • The notes contain several references to "beacon origination interval", i.e. the intervall at which new beacons are created. I couldn't find any description
    of this interval, maybe I overlooked it? Is it the same as the propagation interval?

Some points I found could be useful to add to the doc:

  • I couldn't find a discussion of beacon origination interval (or the RegistrationInterval), see "DefaultOriginationInterval" (or the DefaultRegistrationInterval) in the code.
    Maybe I overlooked it?
  • Load balancing? E.g. everybody picks the shortest paths, all other paths remain unused...? How is this handled?

Introduction

Avoiding Circular Dependencies and Partitioning

  • Does this section (title) make sense as it is? The two topics (circular dependencies and partitioning) appear unrelated. Partitioning is discussed in a separate subsection
  • Also, this section claims to contain a list that explains how circular dependencies are avoided.
    However, I am not sure how anything in the list explains anything about circular dependencies?

Partition and Healing

  • ASes could always switch to otherwise unused links.

    • What are unused links? Unused indicates that they are not in the "Best PCB" set.
      If that is the case, then we cannot simply "switch" to use them, we first need to have them discovered by beacons. This takes time, see propagation interval (unless PCB are forwarded immediately) and beacon origination interval.
    • Also: Does "Healing" include "adding new links"? In this case we also need to wait for the propagation interval and the origination interval.
      As I understand, with propagation interval being set to e.g. 10Minutes, this adds up to 5hops1/210min= 25minutes for 5 hops.

Path Exploration or Beaconing {#beaconing}

Introduction and Overview

  • Intra-ISD beaconing: Isn't this incomplete? How are DOWN segments created? I think leaf ASes need to propagate paths back to the COREs? -> Link to #intra-reg ?
  • Inter-ISD beaconing: Similarily, I think there is a step where CORE beacons share their path DB with other CORE ASes in the same ISD?

Extending a PCB

  • selects the best combinations: Maybe link to a section that explains how this works? And what "best" means?

Path-Segment Construction Beacons

PCB Validity

  • For the purpose of validation, a timestamp is considered "future" if it is later than the current time at the point of validation plus the minimum expiration time of a hop field (337.5 seconds, see ).
    Maybe add an explanation why we add the minimum expiration time here?
    Shouldn't "future" simply be timestamp + some_delta_to_account_for_server_time_inaccuracies, where the delta is maybe a few seconds rather than 5.5 minutes?

Propagation of PCBs {#path-prop}

Selection of PCBs to Propagate {#selection}

Storing and Selecting Candidate PCBs

  • temporary storage: Maybe clarify what "temporary" means.

    • How long are PCBs stored?
    • Under what circumstances are they removed, e.g. when they expire?
    • Or possibly replaced with a new version? What if the new version has an earlier expiration date?
  • At each propagation event, each AS selects a set of the best PCBs from the candidates in the beacon store

  • The best PCBs set size SHOULD be at most "50" (PCBs) for intra-ISD beaconing and at most "5" (PCBs) for core beaconing.

    • I found this a bit confusing: the "50" appears to be the total number of beacons forwarded for non-core ASes, whereas the "5" refers to the number of
      PCBs per remote CORE AS. Maybe clarify this?
  • Note that during bootstrapping and if the AS obtains a PCB containing a previously unknown path, the AS SHOULD forward the PCB immediately, [...].

    • Is this true? It appears to conflict with many other parts of the document that talk about propagation intervals in the contect of cold-start.
    • Is there a difference between "bootstrapping" (used here) and "cold-start" (used in other places)? Maybe stick to one term or explicitly declare equality?
    • Is this subject to the "Best 5 PCB" rule? -> If a new path is immediately forwarded, does it count towards th "best 5"? If not, then we are effectively
      forwarding >5 paths, correct? If yes, then the first 5 paths are always the best until they expire and can be replaced with other paths?
    • What means "unknown path"? Does it refer to all links in the segment or just the remote AS? If it is all links in the segment, then there may be many new paths
      coming in all the time that need to be forwarded immediately, or not?

Effects of Clock Inaccuracy

  • PCBs are propagated at a configurable interval (typically, one minute).

    • Unless they are new, then the interval is ignored, see "immediate cold-start PCB forwarding".
    • Maybe rephrase: "(typically, one minute)" ---> "immediately for new beacons, minimum 5secs for intra-ISD, minimum one minute for inter-ISD"?
      See #path-prop: "The propagation interval SHOULD be at least "5" (seconds) for intra-ISD beaconing and at least "60" (seconds) for core beaconing.".
    • Also, AFAIK, the current configured real-world interval is more like 10-15 minutes...?
  • PCBs with N hops may be validated up to N intervals (so typically N minutes) after origination

    • I think the word "typically" is misleading here, it can be understood as "PCBs [...] are validated typically after N minutes", whereas it actually means that the maximum is typically N minutes.
      Rephrase to "(maximally N minutes)" or "(amounting to N minutes, assuming the minimum inter-ISD propagation interval)"
  • Rephrase The norm is 6 hours. to ... SHOULD be 6 hours ? What does 'norm' mean?

  • In comparison to these time scales, clock offsets in the order of minutes are immaterial.

    • This relates only to the previous paragraph about certificates; I guess it should be attached to the previous paragraph?

Path Discovery Time and Scalability {#scalability}

  • balances _OF_ the number of discovered paths -> remove OF Already fixed

  • Generally, the time until a specific PCB is built depends on its length and the propagation interval.?

    • I think in the context of "cold boot¨, the propagation delay is "0", see "immediate cold-start PCB forwarding".
  • PCB arrives at a random point in time during the interval and is buffered before potentially being propagated

    • see "immediate cold-start PCB forwarding"
    • Also, I think the calculation needs to take into account the "beacon origination interval".
  • As will become apparent, the inter-ISD beaconing results in excessive overhead with very large numbers of participating core ASes.

    • Does this need to be in the IETF spec?
  • The ideal topology for SCION is to keep the inter-ISD core network to a moderate size, to benefit from the divide-and-conquer partitioning of ASes into ISDs and the efficiency of the intra-ISD beaconing.

    • What is done to ensure this? What happens if the size is not moderate? What is "moderate"?

Intra-ISD Beaconing

  • Otherwise, child ASes at distance D below the new link, learn of the new link after D further propagation steps
    • New path: "immediate cold-start PCB forwarding"

Inter-ISD Beaconing

  • On a cold start of the network, [...]. With a 5 second propagation period [...]

    • Above it says that bootstrapping results in immediatede forwarding, see "immediate cold-start PCB forwarding".
  • When a new link is added to the network, it will be available to connect two ASes at distances from the link D1 and D2 from the link, respectively, after a mean time (D1+D2)*T/2.

    • Typo: duplicated "from the link";
    • Also: see previous point about "immediate cold-start PCB forwarding"

@nicorusti
Copy link
Member

nicorusti commented Jul 7, 2024

Thank you for your feedback @tzaeschke ! I respond here regarding the scalability and clock inaccuracy sections. For other sections, and for points that we don't have time to address in time this revision, I opened separate issues:

Regarding Effects of Clock Inaccuracy

I think the word "typically" is misleading here, it can be understood as "PCBs [...] are validated typically after N minutes", whereas it actually means that the maximum is typically N minutes.
Rephrase to "(maximally N minutes)"

Done, maximally N minutes sounds good.

Also, AFAIK, the current configured real-world interval is more like 10-15 minutes...?

@matzf I is it 1 min as in the draft, or 10-15? 10-15 feels a bit high to me

Rephrase The norm is 6 hours. to ... SHOULD be 6 hours ? What does 'norm' mean?

I am a bit reluctant to use RFC2119 language (uppercase SHOULD) for exactly 6 hours. This is a value that overall depends on the maximum AS path expected in the network, and it might as well be a different value. I therefore rephrased like this:
For this reason, it is unadvisable to create hops with a short expiration time, that should be around 6 hours.

In comparison to these time scales, clock offsets in the order of minutes are immaterial.
This relates only to the previous paragraph about certificates; I guess it should be attached to the previous paragraph?

Done.

Regarding Path Discovery Time and Scalability {#scalability}

The notes contain several references to "immediate cold-start PCB forwarding"

@matzf clarified here that this is not the case, I removed that note, this should also solve some many of the consistency issues you reported.

Also, I think the calculation needs to take into account the "beacon origination interval".

To be handled in #45

As will become apparent, the inter-ISD beaconing results in excessive overhead with very large numbers of participating core ASes.
Does this need to be in the IETF spec?

Good point, I rephrased this section to:
To achieve scalability in its routing process, SCION uses a divide-and-conquer approach, partitioning ASes into ISDs. In order to benefit from this, an ideal topology SCION should keep the inter-ISD core network to a moderate size. For more specific observations, we distinguish between intra- and inter-ISD beaconing.

What is done to ensure this? What happens if the size is not moderate? What is "moderate"?

We give some numbers in the Inter-ISD Beaconing section with an example with 1000 core ASes, this gives a rough figure. The bandwidth and computation overhead figures there should also give a rough hint of what happens if the network grows too much: the overhead becomes considerable.
What is done to ensure this IMHO depends on how the network is deployed, I think this topic would be a better fit to be discussed in the new deployment Internet Draft, I opened an issue there: scionassociation/scion-deployment_I-D#1

Typo: duplicated "from the link";

Fixed

@matzf
Copy link
Contributor Author

matzf commented Jul 8, 2024

Commented on the related issues for the other sections.

Also, AFAIK, the current configured real-world interval is more like 10-15 minutes...?

@matzf I is it 1 min as in the draft, or 10-15? 10-15 feels a bit high to me

The 1 minute value seems realistic. SCIONLab uses 5 seconds for non-core beaconing and 1 minute for core beaconing. Anapaya's infrastructure reportedly runs with 30s.

@nicorusti nicorusti merged commit 70f77f0 into main Jul 8, 2024
2 checks passed
@nicorusti nicorusti deleted the scalability branch July 8, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add section on scalability and convergence time
4 participants