-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a 2i2c federation member on Hetzner #3169
Conversation
https://2i2c.mybinder.org/ is up now! https://github.com/2i2c-org/2i2c-org.github.io/pull/356/files#diff-7244b57e647732dd6a8f006bdf63943e1dcb813fa1a085073522ccf40e2cdfc6 has more context - that's also an announcement blog post. It came together quickly. This is a single node k3s cluster running on Hetzner. It's not as large as we'd like it to be - which is CCX63 on https://www.hetzner.com/cloud. That's 48 vCPUs and 192GB of RAM. And with k3s, we can override the number of pods on a node. Given the current guarantee of 450M, we can put approximately 400 pods on this one node! That runs out to less than $1 / month per user capacity which is pretty good. Still need to figure out: 1. Access for everyone else on the team 2. Resize the server to be big, and set up k3s again there from scratch + document (I simply followed the quickstart with traefik disabled) 3. Test prometheus and grafana 4. Add 2i2c to list of supporters Am excited to try this out and see how it goes. Thanks to @choldgraf, @colliand, @jmunroe and others at 2i2c for supporting me through this.
for more information, see https://pre-commit.ci
It was so fast to spin this up! (Just tried binder-examples/requirements) |
Am going to try to put the registry in cluster as well, let's see. |
This is awesome! I won't have time to monitor a deployment rollout until Monday, but this looks great. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Do you want to merge this now?
Is your plan to figure out the automated deployment in a future PR, after you work out how to setup external K8s API access?
This is awesome! I won't have time to monitor a deployment rollout until Monday, but this looks great.
I think it's fine to merge now and revert if necessary- doesn't add a Hetzner GitHub deployment workflow, it only modifies the redirector to direct builds and make Hetzner the prime host.
claimName: registry | ||
containers: | ||
- name: registry | ||
image: registry:2.8.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a future PR we should add this to the watch-dependencies workflow
mybinder.org-deploy/.github/workflows/watch-dependencies.yaml
Lines 33 to 39 in b9439b5
strategy: | |
fail-fast: false | |
matrix: | |
include: | |
- name: repo2docker | |
registry: quay.io | |
repository: jupyterhub/repo2docker |
Can you check the websocket timeout? |
@manics I already sorted out external access (see https://github.com/jupyterhub/mybinder.org-deploy/pull/3169/files#diff-7bd97b74c45fda1d974a0f752288c28f7f38eaf85005a1953adb7dfc6e34df9e)! no github deploy yet |
I've a tattoo appt all day, I'll get back to this once that's done! |
@manics I wonder if that's nginx timeout that needs tuning |
Co-authored-by: Simon Li <[email protected]>
@manics hmm, I can't seem to reproduce the timeout! There's no external load balancer here, just our nginx. Is it still happening to you? |
The current node is really small - only a |
This looks good to me. Thanks 2i2c! <3 |
I've added an encrypted ssh key for ssh access for other team members! |
poking to see if i can switch the registry to object storage already, while we wait for the quota increase. This would mean the instance is purely zero state to a much greater extent. |
Also actually make the registry read the config file - it was not doing that before.
Alright, now we use the hetzner object storage as the storage backend for the registry! and thus we run 2 replicas of the registry as well :) I'm also leaving in comment the small bit of config change that's required to continue using the filesystem backend as well. The goal here is to make it as easy as possible for people to join the federation. And with this, that's simply down to '1 VM' |
I started adding k3s docs! |
Are you missing a commit? |
I've got SSH access, copying the k3s.yaml kubeconfig file and editing the server IP gives me K8S access! |
for more information, see https://pre-commit.ci
@manics ah yes - i was missing a commit. Added now. |
I validated that we can change the number of pods on a node by following https://stackoverflow.com/a/65899273. This node currently is set to max 250 pods although it can't support that many (is smol). Shall be added to the documentation on how to setup k3s. |
secrets/hetzner-2i2c.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the changes in deploy.py, I think this wants to be secrets/hentzner-2i2c-kubeconfig.yml
@yuvipanda you can send the invitation to |
The quota increase was approved! I've taken the existing node offlien and bringing up a new node. will be done tonight |
@rgaiacs done! |
Was crashlooping on hetzner
for more information, see https://pre-commit.ci
Created new server from scratch, rebuilt it, and it's all good to go! I'm going to sleep though, so either I'm happy for someone else to merge it (ssh keys updated so you can debug if necessary), and if not I"ll try to find time. |
Awesome! Dealing with a plumber now, but I'll give it go when I'm free in an hour or two unless someone else is ready first. |
/test-this-pr just to make sure the registry validates and doesn't deploy to staging |
This Pull Request is now being tested 🎉 See the test progress in GitHub Actions. |
Job status: success |
Giving this a try! |
https://2i2c.mybinder.org/ is up now!
https://github.com/2i2c-org/2i2c-org.github.io/pull/356/files#diff-7244b57e647732dd6a8f006bdf63943e1dcb813fa1a085073522ccf40e2cdfc6 has more context - that's also an announcement blog post. It came together quickly.
This is a single node k3s cluster running on Hetzner. It's not as large as we'd like it to be - which is CCX63 on https://www.hetzner.com/cloud. That's 48 vCPUs and 192GB of RAM. And with k3s, we can override the number of pods on a node. Given the current guarantee of 450M, we can put approximately 400 pods on this one node! That runs out to less than $1 / month per user capacity which is pretty good.
Still need to figure out:
Am excited to try this out and see how it goes.
It currently usesquay.io
for image storage, but we can move to a local docker registry backed by hetzner's new S3 service https://www.hetzner.com/storage/object-storage/ eventually.The registry is now using a local setup of CNCF Distribution (aka docker registry), deployed via the chart in here. It's exposed as an Ingress for HTTPS (otherwise everyone complains), but only accessible with a strong password. We can try to figure out if we can restrict it to only being pulled from the local network at the ingress level, or figure out a custom cert situation - although that would need to be validated both by binderhub (for push) and k8s (for pull), and it's nice to let let's encrypt handle that. The images are stored on disk currently, which is fine to start because the hetzner image we will end up using has about 960 GB of fast SSD space. Unfortunately the images are 'doubled' anyway as we push and pull from the same disk (lol) but that's better than pushing to quay and then pulling. We can move this to the hetzner S3 storage when it gets a little bigger.
Thanks to @choldgraf, @colliand, @jmunroe and others at 2i2c for supporting me through this.