Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Week 47 2024 routine #306

Closed
20 of 21 tasks
kiwixbot opened this issue Nov 18, 2024 · 1 comment
Closed
20 of 21 tasks

Week 47 2024 routine #306

kiwixbot opened this issue Nov 18, 2024 · 1 comment
Assignees
Labels
maint Maintenance tasks

Comments

@kiwixbot
Copy link

kiwixbot commented Nov 18, 2024

Check nodes free space

df -h / && df -h /data
  • create a report in issue comment

Nodes system upgrades

apt update && apt upgrade
  • run systematically the upgrade on bastion, stats, services, storage, demo, mirrors-qa nodes
  • check for and apply important security upgrade on worker nodes asap (imager-worker, ondemand, sisyphus)

(regular workers updates are done separately on a monthly basis for worker nodes to not impact production)

Backups

k8s cluster

  • Check Pod errors or in CrashLoopBackoff
k get pods -A -o wide|grep -E 'Error|Crash'
  • Check Pod restarts
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
  • Check if k8s should/could be upgraded
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER | jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions | jq ".versions[].name"

Stats

matomo - stats.kiwix.org

  • Ensure download.kiwix.org stats are being recorded
  • Check whether matomo should be upgraded

Grafana

Projects

Security

Note: this is an automatic reminder intended for the assignee(s).

@kiwixbot kiwixbot added the maint Maintenance tasks label Nov 18, 2024
@rgaudin
Copy link
Member

rgaudin commented Nov 18, 2024

Storage

Machine Filesystem Size Used Avail Use% Use change
bastion / 37G 16G 20G 44% -
stats / 233G 117G 105G 53% -2G
services / 456G 335G 98G 78% +1G
storage / 147G 69G 71G 50% +4G
storage /data 30T 14T 15T 49% -
imager-worker / 1.9T 287G 1.5T 17% don't care
sisyphus / 233G 11G 210G - don't care
ondemand / 25G 9.7G 14G 42% -
ondemand /data 216G 199M 205G 1% don't care
mirrors-qa / 38G 4.6G 32G 13% -
demo / 40G 9.3G 28G 26% -
demo /data 1.8T 974G 690G 59% don't care

Misc

  • Cloud Signings: Unused Signings: 899
  • Zimit pipe is empty

zimit

The webrecorder/browsertrix-crawler#719 code issue is so prominent because every 40x (WAF or auth) now fails like this.

@rgaudin rgaudin closed this as completed Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maint Maintenance tasks
Projects
None yet
Development

No branches or pull requests

3 participants