-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
rework ansible-like loadtest helpers
- Loading branch information
1 parent
a9979ca
commit bb2d107
Showing
14 changed files
with
211 additions
and
103 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
vars.env | ||
state |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,57 @@ | ||
# Ansible-like OpenSSH sessions load test | ||
# ansible-like openssh loadtest | ||
|
||
This setup is designed to be ran from the home directory of a VM (the default working directory for a systemd user service); the proxy public address and cluster name should be changed in `gen_inventory.sh`, `proxy_templates.yaml` and `tbot.yaml` from `PROXYHOST` and `CLUSTERNAME` respectively. It requires openssh, jq, xargs, and dumb-init, as well as tbot and fdpass-teleport. | ||
This setup is intended to generate fake ansible-like load by spawning very massive numbers | ||
of sessions against a large number of teleport nodes. It uses tbot/machineid with ssh multiplexing | ||
to support the needed volume of sessions (as one would do with an ansible master that manages | ||
a massive number of servers via teleport). | ||
|
||
This setup assumes that nodes are being ran by the `node-agent` Helm chart, and proxy templates are applied to do predicate-based dialing on the NODENAME label, as the chart sets up. Commenting or blanking the `proxy_templates.yaml` file (and restarting tbot) will change it to hostname-based dialing. Changing the `proxy_templates.yaml` file (and restarting tbot) can also be used to test a simpler predicate, or to test search-based dialing rather than predicate-based dialing. | ||
This setup is designed to be run on a fresh VM instance, and will perform various | ||
installs and system configuration actions. | ||
|
||
Bot and token can be created with `tctl -f loadtest-bot.yaml`, after editing the IAM account and role in it. Token-based joining with tbot is incredibly annoying, so IAM joining or some other ambient-based joining method is recommended. Running the `node-agent` chart is left as an exercise for the reader. | ||
It expects the following to already be installed on the system: | ||
|
||
The machine running the client should be scaled depending on how many nodes are targeted in the inventory; for 60000 nodes (i.e. 60k shell scripts and 120k ssh processes running at peak) the memory usage with Teleport 15 seems to be ~20GiB for tbot and ~200 for the scripts and SSH, so something like an AWS 32xlarge or 48xlarge might be necessary (maybe the compute-optimized variants, as memory isn't really a problem). Depending on the scale of the test and the runner machine, tuning GOMAXPROCS and GOMEMLIMIT in tbot.service might be useful. | ||
- `openssh` | ||
- `jq` | ||
- `xargs` | ||
|
||
It will perform installation of the following: | ||
|
||
- all default teleport binaries (namely, `tbot`) | ||
- `fdpass-teleport` | ||
- `dumb-init` | ||
|
||
By default, this test setup assumes that the `node-agents` loadtest helm chart is being | ||
used. The proxy templates generated rely on labels set by that helm chart. After setup | ||
is run, it is possible to customize the proxy template used by editing `/etc/tbot/proxy-templates.yaml`. | ||
|
||
Given the extreme scale of tests run with this setup, it is typically necessary to use a | ||
very large VM. For example, 60k agent tests are typically run from a 32xlarge or 48xlarge | ||
instance, either general purpose of compute optimized. | ||
|
||
## Usage | ||
|
||
- Run `tbot_install.sh` to set up tbot (it will install a specific Teleport version as listed in the script, tweak it as required), or `systemctl --user restart tbot.service` if tbot is already set up. | ||
- Run the `gen_inventory.sh` script to produce a list of hosts in random order in the `inventory` file, check that it matches the expected list of hosts. | ||
- Choose a random host in the inventory and confirm that the setup is working with `ssh -F tbot_destdir_mux/ssh_config root@host`. | ||
- Run `run.sh >/dev/null` (in tmux, probably). In a different terminal or tab, check how many sockets are being opened in the ssh controlmaster directory with `ls -1 /run/user/1000/ssh-control | wc -l` to confirm that connections are being established and muxed by ssh. Logs for tbot can be viewed with `journalctl --user-unit tbot --follow`. | ||
- Copy `example.vars.env` to `vars.env` and edit the copy. The `PROXY_HOST` variable | ||
and `BOT_TOKEN` variable *must* be changed. | ||
|
||
- Run `install.sh` once to install `tbot`, `fdpass-teleport`, and `dumb-init`. This only need | ||
ever be run once. | ||
|
||
- Run `init.sh` to set up tbot directories/configuration and start the `bot.service`. If this needs | ||
to be re-run (e.g. if proxy host or token need to be changed), it may be necessary to first manually | ||
halt the tbot service. | ||
|
||
- Run `journalctl -u tbot.service` to verify that `tbot` has successfully authenticated with the cluster. | ||
|
||
- Run `gen-inventory.sh` to generate a list of all target teleport nodes. This only needs to be re-run | ||
if/when the set of agents changes. | ||
|
||
- Verify that the setup is functional by selecting a random host from `state/inventory` and attempting to | ||
access it via `ssh -F /opt/machine-id/ssh_config root@host` | ||
|
||
- Run `run.sh` to run the actual test scenario. This will invoke `run-node.sh` for each member of | ||
the generated inventory and report success/failure of individual attempted sessions. Note that for | ||
large scale tests the output of this script is enormous and may need to be piped to `/dev/null`. Long | ||
running invocations should be performed within a `tmux` session or similar. | ||
|
||
- Verify that ssh connections are being established and multiplexed by monitoring the control master | ||
directory with `ls -1 /run/user/1000/ssh-control | wc -l`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# proxy host is the hostname of the target teleport cluster. | ||
export PROXY_HOST="<proxy-hostname>" | ||
|
||
# proxy port is the port of the teleport web api (typically 443 or 3080). | ||
export PROXY_PORT="443" | ||
|
||
# bot token is the join token that tbot will use to authenticate to the teleport | ||
# cluster. the provided token *must* have the requisite roles to allow for ssh | ||
# server access. | ||
export BOT_TOKEN="<tbot-join-token>" | ||
|
||
# bot user is the local user that the bot service should run at, and determines | ||
# the ownership of the credentials and sockets created in /opt/machine-id. this | ||
# must match the user that will be running the ssh load generation. | ||
export BOT_USER="$USER" | ||
|
||
# teleport artifact is the target artifact from which teleport binaries will be | ||
# installed. | ||
export TELEPORT_ARTIFACT="teleport-v17.0.0-alpha.4-linux-amd64-bin.tar.gz" | ||
|
||
# teleport CDN should likely be one of cdn.cloud.gravitational.io or cdn.teleport.dev, | ||
# the staging and prod cdns respectively. | ||
export TELEPORT_CDN="cdn.cloud.gravitational.io" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
#!/bin/bash | ||
|
||
set -euo pipefail | ||
|
||
cd "$(dirname "$0")" | ||
|
||
source vars.env | ||
|
||
mkdir -p state | ||
|
||
echo "attempting to build inventory..." >&2 | ||
|
||
tsh -i /opt/machine-id/identity --proxy "${PROXY_HOST:?}:${PROXY_PORT:?}" ls --format=json > state/inventory.json | ||
|
||
jq -r '.[] | select(.metadata.expires > (now | strftime("%Y-%m-%dT%H:%M:%SZ"))) | .spec.hostname + ".scale-crdb.cloud.gravitational.io"' < state/inventory.json | sort -R > state/inventory | ||
|
||
echo "successfully generated inventory node_count=$(cat state/inventory | wc -l)" >&2 |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
#!/bin/bash | ||
|
||
set -euo pipefail | ||
|
||
cd "$(dirname "$0")" | ||
|
||
source vars.env | ||
|
||
if systemctl is-active -q tbot.service; then | ||
echo "stopping extant tbot.service..." >&2 | ||
sudo systemctl stop tbot.service | ||
fi | ||
|
||
sudo mkdir -p /etc/tbot | ||
|
||
sudo mkdir -p /var/lib/teleport/bot | ||
|
||
sudo chown -R "${BOT_USER:?}:${BOT_USER:?}" /var/lib/teleport/bot | ||
|
||
sudo mkdir -p /opt/machine-id | ||
|
||
sudo chown -R "${BOT_USER:?}:${BOT_USER:?}" /opt/machine-id | ||
|
||
|
||
echo "generating tbot config..." >&2 | ||
|
||
sudo tee /etc/tbot.yaml > /dev/null <<EOF | ||
version: v2 | ||
proxy_server: ${PROXY_HOST:?}:${PROXY_PORT:?} | ||
diag_addr: "0.0.0.0:3000" | ||
onboarding: | ||
join_method: token | ||
token: ${BOT_TOKEN:?} | ||
outputs: | ||
- type: identity | ||
destination: | ||
type: directory | ||
path: /opt/machine-id | ||
storage: | ||
type: directory | ||
path: /var/lib/teleport/bot | ||
services: | ||
- type: ssh-multiplexer | ||
destination: | ||
type: directory | ||
path: /opt/machine-id | ||
enable_resumption: true | ||
proxy_command: | ||
- fdpass-teleport | ||
proxy_templates_path: /etc/tbot/proxy-templates.yaml | ||
EOF | ||
|
||
|
||
echo "generating proxy templates..." >&2 | ||
|
||
sudo tee /etc/tbot/proxy-templates.yaml > /dev/null <<EOF | ||
proxy_templates: | ||
- template: "^(.*).${PROXY_HOST:?}:[0-9]+$" # <nodename>.<clustername>:<port> | ||
query: 'contains(split(labels.NODENAME, ","), "\$1")' | ||
EOF | ||
|
||
|
||
echo "installing tbot systemd unit..." >&2 | ||
|
||
sudo tbot install systemd --write --force --config /etc/tbot.yaml --user "${BOT_USER:?}" --group "${BOT_USER:?}" | ||
|
||
|
||
echo "starting tbot.service..." >&2 | ||
|
||
sudo systemctl daemon-reload | ||
|
||
sudo systemctl start tbot.service |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
#!/bin/bash | ||
|
||
set -euo pipefail | ||
|
||
cd "$(dirname "$0")" | ||
|
||
source vars.env | ||
|
||
mkdir -p state | ||
|
||
cd state | ||
|
||
echo "installing teleport..." >&2 | ||
|
||
wget -q "https://${TELEPORT_CDN:?}/${TELEPORT_ARTIFACT:?}" | ||
|
||
tar -xf "${TELEPORT_ARTIFACT:?}" | ||
|
||
rm "${TELEPORT_ARTIFACT:?}" | ||
|
||
sudo ./teleport/install | ||
|
||
echo "installing fdpass-teleport..." >&2 | ||
|
||
sudo cp ./teleport/fdpass-teleport "$(dirname "$(which teleport)")" | ||
|
||
rm -rf ./teleport | ||
|
||
|
||
echo "installing dumb-init..." >&2 | ||
|
||
sudo wget -q -O /usr/local/bin/dumb-init https://github.com/Yelp/dumb-init/releases/download/v1.2.5/dumb-init_1.2.5_x86_64 | ||
|
||
sudo chmod +x /usr/local/bin/dumb-init |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
9 changes: 6 additions & 3 deletions
9
assets/loadtest/ansible-like/run_node.sh → assets/loadtest/ansible-like/run-node.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,12 @@ | ||
#!/bin/sh | ||
cd "$( dirname -- "${0}" )" || exit 1 | ||
#!/bin/bash | ||
|
||
set -euo pipefail | ||
|
||
cd "$(dirname "$0")" | ||
|
||
sleep "$( echo "90 * $(od -An -N4 -tu4 /dev/urandom) / 4294967295" | bc -l )" | ||
|
||
ssh_opts="-qn -F tbot_destdir_mux/ssh_config -S /run/user/1000/ssh-control/%C -o ControlMaster=auto -o ControlPersist=60s -o Ciphers=^[email protected] -l root" | ||
ssh_opts="-qn -F /opt/machine-id/ssh_config -S /run/user/1000/ssh-control/%C -o ControlMaster=auto -o ControlPersist=60s -o Ciphers=^[email protected] -l root" | ||
|
||
i=0 | ||
while [ $i -lt 10000 ] ; do | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,9 @@ | ||
#!/bin/sh | ||
cd "$( dirname -- "${0}" )" || exit 1 | ||
#!/bin/bash | ||
|
||
set -euo pipefail | ||
|
||
cd "$(dirname "$0")" | ||
|
||
mkdir -p /run/user/1000/ssh-control | ||
|
||
exec dumb-init xargs -P 0 -I % ./run_node.sh % < inventory | ||
exec dumb-init xargs -P 0 -I % ./run-node.sh % < state/inventory |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.