Weekly routine Issue #139
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
name: Weekly routine Issue | |
on: | |
schedule: | |
- cron: "0 8 * * 1" | |
jobs: | |
job: | |
runs-on: ubuntu-24.04 | |
steps: | |
- name: Create routine issue | |
shell: python | |
run: | | |
import datetime | |
import os | |
import re | |
import sys | |
import requests | |
payload = { | |
"title": f"Week {datetime.datetime.now().strftime('%W %Y')} routine", | |
"body": os.getenv("ISSUE_BODY"), | |
"labels": re.sub(r"\s", "", os.getenv("LABELS", "")).split(",") or None, | |
"assignees": re.sub(r"\s", "", os.getenv("ASSIGNEES", "")).split(",") or None, | |
} | |
api_url = f"https://api.github.com/repos/{os.getenv('REPO')}/issues" | |
url = f"https://github.com/{os.getenv('REPO')}/issues" | |
headers = { | |
"Accept": "application/vnd.github.v3+json", | |
"Authorization": f"token {os.getenv('ACCESS_TOKEN')}", | |
} | |
resp = requests.get(api_url, headers=headers) | |
if resp.status_code != 200: | |
print(f"❌ Couldn't retrieve issues for {url} using {api_url}.") | |
print(f"HTTP {resp.status_code} {resp.reason} - {resp.text}") | |
print("Check your `ACCESS_TOKEN` secret.") | |
sys.exit(1) | |
resp = requests.post(api_url, headers=headers, json=payload) | |
if resp.status_code != 201: | |
print(f"❌ Couldn't create issue for {url}") | |
print(f"HTTP {resp.status_code} {resp.reason} - {resp.text}") | |
sys.exit(1) | |
print(f"✅ Issue successfully created at {url}/{resp.json().get('number')}") | |
sys.exit(0) | |
env: | |
ACCESS_TOKEN: ${{ secrets.ACCESS_TOKEN }} | |
REPO: kiwix/k8s | |
LABELS: maint | |
ASSIGNEES: rgaudin,benoit74 | |
ISSUE_BODY: | | |
## Check nodes free space | |
```sh | |
df -h / && df -h /data | |
``` | |
- [ ] create a report in issue comment | |
## Nodes system upgrades | |
```sh | |
apt update && apt upgrade | |
``` | |
- [ ] run systematically the upgrade on bastion, stats, services, storage, demo, mirrors-qa nodes | |
- [ ] check for and apply important security upgrade on worker nodes asap (imager-worker, ondemand, sisyphus) | |
(regular workers updates are done separately on a monthly basis for worker nodes to not impact production) | |
## Backups | |
- [ ] Ensure [all borg repositories](https://www.borgbase.com/) are being updated | |
## k8s cluster | |
- [ ] Check Pod errors or in CrashLoopBackoff | |
```sh | |
k get pods -A -o wide|grep -E 'Error|Crash' | |
``` | |
- [ ] Check Pod restarts | |
```sh | |
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))' | |
``` | |
- [ ] Check if k8s should/could be upgraded | |
```sh | |
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER | jq ".version,.upgrade_available" | |
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions | jq ".versions[].name" | |
``` | |
- [ ] [Upgrade k8s](https://github.com/kiwix/k8s/wiki/Cluster-Setup#upgrading-kubernetes) if applicable/possible | |
## Stats | |
[matomo - stats.kiwix.org](https://stats.kiwix.org) | |
- [ ] Ensure download.kiwix.org stats are being recorded | |
- [ ] Check whether matomo should be upgraded | |
## Grafana | |
- [ ] Alert list is [normal](https://kiwixorg.grafana.net/alerting/list) | |
- [ ] Zimfarm dashboard is [normal](https://kiwixorg.grafana.net/d/d2803d94-7c40-4338-bf80-f3cd7cd796bf/zimfarms?from=now-7d&to=now) | |
- [ ] Jobs durations dashboard is [normal](https://kiwixorg.grafana.net/d/bb0f0990-04c5-4314-8afc-6185ac49c668/jobs?orgId=1) | |
- [ ] There is no abnormal behaviors on [cluster resources consumption](https://kiwixorg.grafana.net/a/grafana-k8s-app/navigation/cluster/kiwix-prod) | |
- [ ] All [mdadm](https://kiwixorg.grafana.net/d/edu6v6ekri77kd/mdadm) RAID arrays are OK (check all instances in the top left combobox) | |
## Projects | |
- [ ] UptimeRobot [has no alert](https://dashboard.uptimerobot.com/monitors) | |
- [ ] [zimit backlog](https://farm.zimit.kiwix.org/pipeline/filter-todo) is reasonable | |
- [ ] [Cloud Code Signing Certificate](https://secure.ssl.com/certificate_orders/co-291j8smt34b) usage is *OK* (look for *Unused Signings* under *END ENTITY CERTIFICATES*): we have 1,200 per year (August to August) | |
- [ ] Analyze [zimit failed tasks](https://farm.zimit.kiwix.org/pipeline/filter-failed) and document bad domains in our [WIP blacklist](https://docs.google.com/spreadsheets/d/1mBjWT0hLmeg6EqT4nNEfCzLU8hGSzYs4IgbWDInhPqA/edit?gid=0#gid=0) | |
- [ ] [PRs awaiting your review](https://github.com/notifications?query=reason%3Areview-requested) | |
## Security | |
- [ ] Analyze/merge [dependabot PRs](https://github.com/notifications?query=author%3Adependabot[bot]) | |
**Note**: this is an *automatic reminder* intended for the assignee(s). | |