From d7ee98995f832ddf4ae97b7188be5699737e099e Mon Sep 17 00:00:00 2001 From: Aditya Hase Date: Fri, 6 Sep 2024 16:01:35 +0530 Subject: [PATCH] chore: Add incomplete docs Just some scratch work. Don't take it very seriously. --- pilot/pilot/design-wip/cluster.md | 28 ++++++++++ pilot/pilot/design-wip/containers.md | 9 +++ pilot/pilot/design-wip/networking.md | 82 ++++++++++++++++++++++++++++ pilot/pilot/design-wip/problems.md | 50 +++++++++++++++++ pilot/pilot/design-wip/quick.md | 65 ++++++++++++++++++++++ 5 files changed, 234 insertions(+) create mode 100644 pilot/pilot/design-wip/cluster.md create mode 100644 pilot/pilot/design-wip/containers.md create mode 100644 pilot/pilot/design-wip/networking.md create mode 100644 pilot/pilot/design-wip/problems.md create mode 100644 pilot/pilot/design-wip/quick.md diff --git a/pilot/pilot/design-wip/cluster.md b/pilot/pilot/design-wip/cluster.md new file mode 100644 index 0000000..a0db6ca --- /dev/null +++ b/pilot/pilot/design-wip/cluster.md @@ -0,0 +1,28 @@ +# Cluster + +## Current +Current Press clusters are simply 2m + 1 servers + +1 (or 2) x Proxy +m x App +n x DB + +All services are external +- Log +- Metrics +- Registry + + +## New + +### Move Log, Metrics to inside the cluster to reduce bandwidth + +For metrics this means more frequent scrape intervals would be possible + +### Dynamic Ingress + +Allow multiple VMs to handle Ingress simultaneously + +#### S + +All Non-Host Data \ No newline at end of file diff --git a/pilot/pilot/design-wip/containers.md b/pilot/pilot/design-wip/containers.md new file mode 100644 index 0000000..c6baf1c --- /dev/null +++ b/pilot/pilot/design-wip/containers.md @@ -0,0 +1,9 @@ +# Container Runtime + +| Runtime | User Namespaces | Daemon | +|-----------|-------------------|-----------| +| Docker | Fixed. Upto 5? | Yes | +| Podman | Arbitrary | No | +| Containerd| Fixed | Yes | +| CRI-O | Fixed | Yes | + diff --git a/pilot/pilot/design-wip/networking.md b/pilot/pilot/design-wip/networking.md new file mode 100644 index 0000000..5e93a91 --- /dev/null +++ b/pilot/pilot/design-wip/networking.md @@ -0,0 +1,82 @@ +## IPv4? +### 10.0.0.0/8 +Prevents Multiple VPCs from communicating. Since they share the same address space. + +### 10.x.0.0/16 +This is what we currently use +This gives us 256 VPCs. Seems plenty at this point. Unless we go all out on On-Premise VPCs. + + +That's 65k adresses per VPC. More than enough for host routing. + +Not enough to give each customer / project a segment. + +Let's say we cap each project to 128 addresses. That gives us 512 projects. We're already far beyond this. + +Doesn't work + +### Isolated +If all projects are isolated then then we can assign them the same range over and over and still be fine. + + +What if we assign 10.xx.xx.0/24 to each project. That gives them 256 addresses + +#### VXLAN + +##### Static IP - Dynamic Routing +The disovery based VXLAN was complicated since you needed to communicate node-reallocation to all neighbours. + +e.g. + +nginx 10.0.0.1 +app 10.0.0.2 +are on 172.196.1.1 + +and db 10.0.0.3 is on 172.196.1.2 + +then we need to know +10.0.0.1 -> 172.196.1.1 +10.0.0.2 -> 172.196.1.1 +10.0.0.3 -> 172.196.1.2 + +because of node-failure/rellocation/update this information changes then we need to update all the neighbours. + + +##### Dynamic IP - Static Routing +The other option is that if + +nginx 10.0.1.1 is on 172.196.1.1 +app 10.0.1.2 is on 172.196.1.1 +and +db 10.0.2.1 is on 172.196.1.2 + +then we always know +10.0.x.y is on 172.196.1.x + +So our routing becomes insanely simple. The downside is that now we have to involve DNS. + + + +### DNS Based Updates + +On these two + +Node Failure -> Reallocation +Rebalancing -> Reallocation +Deploy / Update -> Allocation / Deallocation + + +## IPv6 + +This needs eBPF to get right. Let's not deal with it right away. +ULA? +https://en.wikipedia.org/wiki/Unique_local_address + + +https://community.fly.io/t/6pn-addressing-clarification/1362 +https://cloud.google.com/blog/products/networking/using-ipv6-unique-local-addresses-or-ula-in-google-cloud + + + +ULA fd00::/8 +Google VPC splits this 8/40/16/32/32 Each VM gets a /96 address diff --git a/pilot/pilot/design-wip/problems.md b/pilot/pilot/design-wip/problems.md new file mode 100644 index 0000000..cdbead2 --- /dev/null +++ b/pilot/pilot/design-wip/problems.md @@ -0,0 +1,50 @@ +# Probems +A rough list of problems I need to solve + +## Declarative Provisioning +### Scaffolding +- [x] Minimal wrapper around OpenTofu / CDKTF +- [x] Example usage for DigitalOcean + +### State Management +- [x] Doctypes to track updates to each provision request +State / Plan / Action etc. + + +### Networking +- [] Elastic IPs for Ingress +- [] DNS for Ingress +- [] Subnets / Routing Rules +- [] Track Firewall Rules / Security Groups + +### Nodes +- [] Base Images +- [] Bootstrapping + +## Bootstrapping +-[] Base Images + +### Scaffolding +-[] + +## Scheduling +### Scaffolding +### Allocation +### Drain + +## Orchestration +### Scaffolding +### Updates + + +## Networking +### Overlay +### Ingress + +### Egress + +## Workflows +### + +## Storage +- [] Provision Attached Block Storage with each Node \ No newline at end of file diff --git a/pilot/pilot/design-wip/quick.md b/pilot/pilot/design-wip/quick.md new file mode 100644 index 0000000..c2849d7 --- /dev/null +++ b/pilot/pilot/design-wip/quick.md @@ -0,0 +1,65 @@ +## Cluster Architecture +### Press +Currently Press is aware of all Nodes and expects them to be available and responsive at all times + +In situations when the Nodes don't respond. We must the node back up and retry the request. + +```mermaid +sequenceDiagram + Press -x Node : Run Service-X + Press ->> Node : Run Service-X + Node ->> Press:ACK + destroy Node + destroy Press + +``` +### Alternative - 1 + +Press only needs to be aware of some Nodes. Let's call them Control Nodes. + +Control Nodes acknowledge all Press requests and then + +This looks a lot like Nomad comms: You send jobs to the Servers and that's it, you don't talk to the Clients and don't need to be aware of them. + +This means too many messages back and forth and multiple sources of truths? + +But. We can keep only need to be aware of Control Nodes to say 3. With strong replication these can be made ephemeral. So failure to communicate with one of them isn't the end of the world. + +Also local comms. e.g. Node-X has failed and rescheduling work to Node-Y doesn't need to involve Press. + +With Global deployments Press <<->> Control comms have ~100ms latency. This seems way too high for fast comms. +Also global communications don't always work. + +Too many times we've seen a region completely inaccessible to Press. Though this is inevitable. It ideally shouldn't affect normal operations (I want a Node failure to be treated as a normal operation. Shouldn't require our intervention) + + +```mermaid +sequenceDiagram + Press ->> Control : Run Service-X + Control ->> Press : ACK + Node ->> Control : Any Work for me? + Control ->> Node : Run Worker-X-1 + Node ->> Control : ACK + destroy Node + destroy Control + destroy Press + +``` + +--- + +# Kinds of Communication + +## Provision +We've already taken this out of scope with OpenTofu + +## Orchestration + +Deploying Containers on Nodes + +Some auxiliary work. Setting up Network, Volumes etc. + +## Services? + +Create/Migrate/Drop Sites +Transfer Volumes between Nodes \ No newline at end of file