Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert TeleportProcess to use slog for logging #38551

Merged
merged 1 commit into from
Mar 12, 2024
Merged

Conversation

rosstimothy
Copy link
Contributor

Begins the conversion away from logrus to slog for all messages emitted from TeleportProcess. The logrus logger should now only be in use for dependency injection to child components that haven't been converted yet. There are still instances of the logrus logger in the process configuration to emit messages, they will be converted in a follow up PR.

@rosstimothy rosstimothy added the no-changelog Indicates that a PR does not require a changelog entry label Feb 22, 2024
@rosstimothy rosstimothy marked this pull request as ready for review February 22, 2024 19:11
@github-actions github-actions bot requested review from kimlisa and tcsc February 22, 2024 19:11
@rosstimothy
Copy link
Contributor Author

PTAL @kimlisa @tcsc

@@ -139,7 +139,7 @@ func (process *TeleportProcess) authServerTooOld(resp *proto.PingResponse) error

if serverVersion.Major < teleportVersion.Major {
if process.Config.SkipVersionCheck {
process.log.Warnf("Teleport instance is too new. This instance is running v%d. The auth server is running v%d and only supports instances on v%d or v%d.", teleportVersion.Major, serverVersion.Major, serverVersion.Major, serverVersion.Major-1)
process.logger.WarnContext(process.ExitContext(), "This instance is too new. Using a newer major version than the Auth server is unsupported and may impair functionality.", "version", teleportVersion.Major, "auth_version", serverVersion.Major, "supported_versions", []int64{serverVersion.Major, serverVersion.Major - 1})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wanted to understand how this looks when printed. I understand how the previous will look like in my head, but i'm having trouble with this one 😓

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should look like the following

2024-03-05T00:19:15-05:00 WARN [PROC:1]   This instance is too new. Using a newer major version than the Auth server is unsupported and may impair functionality. pid:46154.1 version:15.1.0 auth_version:14.2.0 supported_versions: [14, 13] service/connect.go:142

lib/service/connect.go Show resolved Hide resolved
@rosstimothy
Copy link
Contributor Author

Friendly ping @tcsc

@rosstimothy
Copy link
Contributor Author

@tcsc could you please review?

Begins the conversion away from logrus to slog for all messages
emitted from TeleportProcess. The logrus logger should now only
be in use for dependency injection to child components that haven't
been converted yet. There are still instances of the logrus logger
in the process configuration to emit messages, they will be converted
in a follow up PR.
@rosstimothy rosstimothy added this pull request to the merge queue Mar 12, 2024
Merged via the queue into master with commit 34aa080 Mar 12, 2024
34 checks passed
@rosstimothy rosstimothy deleted the tross/process_slog branch March 12, 2024 22:23
rosstimothy added a commit that referenced this pull request Mar 13, 2024
The change to migrate the TeleportProcess to use slog in
#38551 was preventing
the regular expressions used by the operator tests to determine
when auth and proxy were ready. This lead to the tests hanging indefinitely
until the go test framework timed out. To remediate, the regular
expressions have been updated to match the output when using slog.

For example, the test was expecting the output in the format of:

```
Auth service 15.0.0:v15.0.0-0-ge126e8cd is starting on 192.168.50.108:3025. pid:61933.1 service/service.go:2072
```

#38551 moved the listen address to a field instead of being formatted
in the message:

```
Auth service is starting. pid:54159.1 version:16.0.0-dev git_ref:api/v13.4.16-31-g2b9ae35e31 listen_address:127.0.0.1:63152 service/service.go:2107
```
github-merge-queue bot pushed a commit that referenced this pull request Mar 14, 2024
The change to migrate the TeleportProcess to use slog in
#38551 was preventing
the regular expressions used by the operator tests to determine
when auth and proxy were ready. This lead to the tests hanging indefinitely
until the go test framework timed out. To remediate, the regular
expressions have been updated to match the output when using slog.

For example, the test was expecting the output in the format of:

```
Auth service 15.0.0:v15.0.0-0-ge126e8cd is starting on 192.168.50.108:3025. pid:61933.1 service/service.go:2072
```

#38551 moved the listen address to a field instead of being formatted
in the message:

```
Auth service is starting. pid:54159.1 version:16.0.0-dev git_ref:api/v13.4.16-31-g2b9ae35e31 listen_address:127.0.0.1:63152 service/service.go:2107
```
@GavinFrazar
Copy link
Contributor

GavinFrazar commented Mar 15, 2024

@rosstimothy I noticed that with this change my e2e test logs now print the teleport process identity private key and certs bytes. Look for this: Client successfully connected to cluster ... *key bytes and all other fields follow*...

Looks like the offending line is in lib/service/connect.go:158 where it does not use the client identity String method for some reason.

@rosstimothy
Copy link
Contributor Author

@rosstimothy I noticed that with this change my e2e test logs now print the teleport process identity private key and certs bytes. Look for this: Client successfully connected to cluster ... *key bytes and all other fields follow*...

Looks like the offending line is in lib/service/connect.go:158 where it does not use the client identity String method for some reason.

lib/service/connect.go:158 does not directly call the String method intentionally for performance reasons. Under the hood slog will call String for any attribute that is a fmt.Stringer if and only if the log level of the message is one that will be included in the output, see https://pkg.go.dev/log/slog#hdr-Performance_considerations for more details.

@rosstimothy
Copy link
Contributor Author

The text handler does indeed do the right thing, but the json handler isn't calling String.

rosstimothy added a commit that referenced this pull request Mar 18, 2024
Begins the conversion away from logrus to slog for all messages
emitted from TeleportProcess. The logrus logger should now only
be in use for dependency injection to child components that haven't
been converted yet. There are still instances of the logrus logger
in the process configuration to emit messages, they will be converted
in a follow up PR.
rosstimothy added a commit that referenced this pull request Mar 28, 2024
Begins the conversion away from logrus to slog for all messages
emitted from TeleportProcess. The logrus logger should now only
be in use for dependency injection to child components that haven't
been converted yet. There are still instances of the logrus logger
in the process configuration to emit messages, they will be converted
in a follow up PR.
github-merge-queue bot pushed a commit that referenced this pull request Mar 28, 2024
* Convert TeleportProcess to use slog for logging (#38551)

Begins the conversion away from logrus to slog for all messages
emitted from TeleportProcess. The logrus logger should now only
be in use for dependency injection to child components that haven't
been converted yet. There are still instances of the logrus logger
in the process configuration to emit messages, they will be converted
in a follow up PR.

* Make slog attributes output the same content regardless of format (#39428)

* Make slog attributes output the same content regardless of format

There were a few instances of fmt.Stringers being passed to slog
as attributes to delay their evaluation as per the performance
considerations of slog: https://pkg.go.dev/log/slog#hdr-Performance_considerations.
However, this resulted in the json handler emitting different
content since they check to see if attributes are json.Marshalers
and not fmt.Stringers. To rectify the situation, a new slog.LogValuer
implementation was added that wraps a fmt.Stringer so that attributes
can be lazily constructed while producing the same output no matter
the log output format.

* fix: wrap identity and correct typo

* Fix integrations ssh service tests (#39543)

Similar to #39315,
this updates the integrations test suite to use a regular
expression that will match output to know when the ssh_service is
ready now that Teleport is using slog instead of logrus.
marcoandredinis added a commit to gravitational/teleport-plugins that referenced this pull request Apr 2, 2024
This new version of teleport has the following relevant changes:

* Teleport process logging uses `log/slog`
For terraform tests we start a teleport binary and parse its output to understand when the Auth/Proxy service started and what ports are they listening on.
We used a regex for that, but teleport migrated to `log/slog` and the regex no longer works.
Migration PR: gravitational/teleport#38551
We had to fix the regex for integration tests in teleport: gravitational/teleport#39315
Terraform Tests also use that library, so after the regex changed, we must upgrade Teleport CI version to get the new log format.

* Teleport API: `GetClusterNetworkingConfig` and `GetSessionRecordingConfig` never return a nil

When developing the `ClusterMaitenanceConfig` we had to include a nil check, because if it was never configured, `GetClusterMaintenanceConfig` would return a nil object.
This nil check was added to all SingleResource resources.

For `ClusterNetworkingConfig` and `SessionRecordingConfig`, the `Get` operation never returns a nil resource and `staticcheck` linter was yelling about it.
So, we had to create a new flag to ensure we only nil-checked the resources that can actually return a nil value.

* Teleport Resource Metadata
It is no longer recommended to use the `<Resource>.Metadata.ID` to check for cached responses.
We are now using the revision field.

During this change we also detected a miss-usage of an `error` variable and fixed that (could lead to a panic).

LoginRules didn't have the `Revision` field, so we added it here: gravitational/teleport.e#3821

Unfortunately, that PR didn't merge in time for 15.2.0.
However, that's ok because LoginRules are not cached.
So, instead of waiting for a new release (15.2.1), we just removed the cache check.
marcoandredinis added a commit to gravitational/teleport-plugins that referenced this pull request Apr 2, 2024
This new version of teleport has the following relevant changes:

* Teleport process logging uses `log/slog`
For terraform tests we start a teleport binary and parse its output to understand when the Auth/Proxy service started and what ports are they listening on.
We used a regex for that, but teleport migrated to `log/slog` and the regex no longer works.
Migration PR: gravitational/teleport#38551
We had to fix the regex for integration tests in teleport: gravitational/teleport#39315
Terraform Tests also use that library, so after the regex changed, we must upgrade Teleport CI version to get the new log format.

* Teleport API: `GetClusterNetworkingConfig` and `GetSessionRecordingConfig` never return a nil

When developing the `ClusterMaitenanceConfig` we had to include a nil check, because if it was never configured, `GetClusterMaintenanceConfig` would return a nil object.
This nil check was added to all SingleResource resources.

For `ClusterNetworkingConfig` and `SessionRecordingConfig`, the `Get` operation never returns a nil resource and `staticcheck` linter was yelling about it.
So, we had to create a new flag to ensure we only nil-checked the resources that can actually return a nil value.

* Teleport Resource Metadata
It is no longer recommended to use the `<Resource>.Metadata.ID` to check for cached responses.
We are now using the revision field.

During this change we also detected a miss-usage of an `error` variable and fixed that (could lead to a panic).

LoginRules didn't have the `Revision` field, so we added it here: gravitational/teleport.e#3821

Unfortunately, that PR didn't merge in time for 15.2.0.
However, that's ok because LoginRules are not cached.
So, instead of waiting for a new release (15.2.1), we just removed the cache check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog Indicates that a PR does not require a changelog entry size/sm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants