Skip to content

Commit

Permalink
chore: Add incomplete docs
Browse files Browse the repository at this point in the history
Just some scratch work. Don't take it very seriously.
  • Loading branch information
adityahase committed Sep 6, 2024
1 parent 69a87a6 commit d7ee989
Show file tree
Hide file tree
Showing 5 changed files with 234 additions and 0 deletions.
28 changes: 28 additions & 0 deletions pilot/pilot/design-wip/cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Cluster

## Current
Current Press clusters are simply 2m + 1 servers

1 (or 2) x Proxy
m x App
n x DB

All services are external
- Log
- Metrics
- Registry


## New

### Move Log, Metrics to inside the cluster to reduce bandwidth

For metrics this means more frequent scrape intervals would be possible

### Dynamic Ingress

Allow multiple VMs to handle Ingress simultaneously

#### S

All Non-Host Data
9 changes: 9 additions & 0 deletions pilot/pilot/design-wip/containers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Container Runtime

| Runtime | User Namespaces | Daemon |
|-----------|-------------------|-----------|
| Docker | Fixed. Upto 5? | Yes |
| Podman | Arbitrary | No |
| Containerd| Fixed | Yes |
| CRI-O | Fixed | Yes |

82 changes: 82 additions & 0 deletions pilot/pilot/design-wip/networking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
## IPv4?
### 10.0.0.0/8
Prevents Multiple VPCs from communicating. Since they share the same address space.

### 10.x.0.0/16
This is what we currently use
This gives us 256 VPCs. Seems plenty at this point. Unless we go all out on On-Premise VPCs.


That's 65k adresses per VPC. More than enough for host routing.

Not enough to give each customer / project a segment.

Let's say we cap each project to 128 addresses. That gives us 512 projects. We're already far beyond this.

Doesn't work

### Isolated
If all projects are isolated then then we can assign them the same range over and over and still be fine.


What if we assign 10.xx.xx.0/24 to each project. That gives them 256 addresses

#### VXLAN

##### Static IP - Dynamic Routing
The disovery based VXLAN was complicated since you needed to communicate node-reallocation to all neighbours.

e.g.

nginx 10.0.0.1
app 10.0.0.2
are on 172.196.1.1

and db 10.0.0.3 is on 172.196.1.2

then we need to know
10.0.0.1 -> 172.196.1.1
10.0.0.2 -> 172.196.1.1
10.0.0.3 -> 172.196.1.2

because of node-failure/rellocation/update this information changes then we need to update all the neighbours.


##### Dynamic IP - Static Routing
The other option is that if

nginx 10.0.1.1 is on 172.196.1.1
app 10.0.1.2 is on 172.196.1.1
and
db 10.0.2.1 is on 172.196.1.2

then we always know
10.0.x.y is on 172.196.1.x

So our routing becomes insanely simple. The downside is that now we have to involve DNS.



### DNS Based Updates

On these two

Node Failure -> Reallocation
Rebalancing -> Reallocation
Deploy / Update -> Allocation / Deallocation


## IPv6

This needs eBPF to get right. Let's not deal with it right away.
ULA?
https://en.wikipedia.org/wiki/Unique_local_address


https://community.fly.io/t/6pn-addressing-clarification/1362
https://cloud.google.com/blog/products/networking/using-ipv6-unique-local-addresses-or-ula-in-google-cloud



ULA fd00::/8
Google VPC splits this 8/40/16/32/32 Each VM gets a /96 address
50 changes: 50 additions & 0 deletions pilot/pilot/design-wip/problems.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Probems
A rough list of problems I need to solve

## Declarative Provisioning
### Scaffolding
- [x] Minimal wrapper around OpenTofu / CDKTF
- [x] Example usage for DigitalOcean

### State Management
- [x] Doctypes to track updates to each provision request
State / Plan / Action etc.


### Networking
- [] Elastic IPs for Ingress
- [] DNS for Ingress
- [] Subnets / Routing Rules
- [] Track Firewall Rules / Security Groups

### Nodes
- [] Base Images
- [] Bootstrapping

## Bootstrapping
-[] Base Images

### Scaffolding
-[]

## Scheduling
### Scaffolding
### Allocation
### Drain

## Orchestration
### Scaffolding
### Updates


## Networking
### Overlay
### Ingress

### Egress

## Workflows
###

## Storage
- [] Provision Attached Block Storage with each Node
65 changes: 65 additions & 0 deletions pilot/pilot/design-wip/quick.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
## Cluster Architecture
### Press
Currently Press is aware of all Nodes and expects them to be available and responsive at all times

In situations when the Nodes don't respond. We must the node back up and retry the request.

```mermaid
sequenceDiagram
Press -x Node : Run Service-X
Press ->> Node : Run Service-X
Node ->> Press:ACK
destroy Node
destroy Press
```
### Alternative - 1

Press only needs to be aware of some Nodes. Let's call them Control Nodes.

Control Nodes acknowledge all Press requests and then

This looks a lot like Nomad comms: You send jobs to the Servers and that's it, you don't talk to the Clients and don't need to be aware of them.

This means too many messages back and forth and multiple sources of truths?

But. We can keep only need to be aware of Control Nodes to say 3. With strong replication these can be made ephemeral. So failure to communicate with one of them isn't the end of the world.

Also local comms. e.g. Node-X has failed and rescheduling work to Node-Y doesn't need to involve Press.

With Global deployments Press <<->> Control comms have ~100ms latency. This seems way too high for fast comms.
Also global communications don't always work.

Too many times we've seen a region completely inaccessible to Press. Though this is inevitable. It ideally shouldn't affect normal operations (I want a Node failure to be treated as a normal operation. Shouldn't require our intervention)


```mermaid
sequenceDiagram
Press ->> Control : Run Service-X
Control ->> Press : ACK
Node ->> Control : Any Work for me?
Control ->> Node : Run Worker-X-1
Node ->> Control : ACK
destroy Node
destroy Control
destroy Press
```

---

# Kinds of Communication

## Provision
We've already taken this out of scope with OpenTofu

## Orchestration

Deploying Containers on Nodes

Some auxiliary work. Setting up Network, Volumes etc.

## Services?

Create/Migrate/Drop Sites
Transfer Volumes between Nodes

0 comments on commit d7ee989

Please sign in to comment.