Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine ID: Update documentation to reflect tbot CLI changes #48337

Merged
merged 10 commits into from
Nov 8, 2024
35 changes: 22 additions & 13 deletions docs/pages/enroll-resources/machine-id/troubleshooting.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,13 @@ some additional context:

### Explanation

<Notice type="note">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice addition.

This applies only to bots using the `token` join method, which makes use of
one-time use shared secrets. Provider-specific join methods, such as GitHub,
AWS IAM, etc will not be locked in this fashion unless another instance of the
bot uses `token` joining.
</Notice>

Machine ID (with token-based joining) uses a certificate generation counter to
detect potentially stolen renewable certificates. Each time a bot fetches a new
renewable certificate, the Auth Service increments the counter, stores it on the
Expand Down Expand Up @@ -69,9 +76,11 @@ directories (usually `/opt/machine-id`) rather than the internal data directory
Once you have addressed the underlying cause, follow these steps to reset a
locked bot:
1. Remove the lock on the bot's user
1. Reset the bot's generation counter by deleting and re-creating the bot
1. Reset the bot's generation counter by creating a new bot instance

To remove the lock, first find and remove the lock targeting the bot user:
To remove the lock, first find and remove the lock targeting the bot user. For
this example, we'll assume the bot is named `example`, which will have an
associated Teleport user named `bot-example`:

```code
$ tctl get locks
Expand All @@ -89,15 +98,16 @@ version: v2
$ tctl rm lock/5cee949f-5203-4f3b-9805-dac35d798a16
```

Next, reset the generation counter by deleting and recreating the bot:
Next, use `tctl bots instances add` to generate a new join token for the
preexisting bot `example`:
```code
$ tctl bots rm example

$ tctl bots add example --roles=foo,bar
$ tctl bots instances add example
timothyb89 marked this conversation as resolved.
Show resolved Hide resolved
```

Finally, reconfigure the bot with the new token and restart it. It will detect
the new token and automatically reset its internal data directory.
Finally, reconfigure the local `tbot` instance with the new token and restart
it. It will detect the new token and automatically reset its internal data
directory. The bot will be issued a new bot instance UUID once connected, and
the generation counter will be reset.

## `tbot` shows a "bad certificate error" at startup

Expand All @@ -123,8 +133,8 @@ fail."

Token-joined bots are unable to reauthenticate to the Teleport Auth Service once
their certificates have expired. Tokens in token-based joining (as opposed to
AWS IAM joining) can only be used once, so when the bot's internal certificates
expire, it will not be able to connect.
AWS IAM and other join methods) can only be used once, so when the bot's
internal certificates expire, it will not be able to connect.

When a bot's identity expires, certain parameters associated with the bot on the
Auth Service must be reset and a new joining token must be issued. The simplest
Expand All @@ -133,11 +143,10 @@ server-side data and issues a new joining token.

### Resolution

Remove and recreate the bot, replacing the name and role list as desired:
Use `tctl bots instances add` to create a new one-time use token for the bot:

```code
$ tctl bots rm example
$ tctl bots add example --roles=access
$ tctl bots instances add example
```

Copy the resulting join token into the existing bot config—either the
Expand Down
Loading
Loading