Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VC-35738] Log with klog to stdout and stderr in Kubernetes text format #596

Merged
merged 12 commits into from
Oct 31, 2024

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented Oct 16, 2024

I've settled on using klog and kubernetes component-base module for the reasons given in the comments in the logs.go file.

I think this logging UX is most likely to be familiar to Kubernetes platform admins who are likely to be deploying and debugging the venafi-kubernetes-agent.

After discussion with @maelvls we settled on the flags: -v, --log-level, --logging-format, and --vmodule
for maximum consistency with the other TLSPK components.

I gave up on Slog because it seemed to have unfamiliar severity / levels and I couldn't get the klog client-go logs to have the right levels with slog.

Using klog allows us to debug problems in client-go.
For example, you can view the API requests and responses using --vmodule=round_tripper=6.

In followup PRs I will convert the existing log lines to use context derived logr loggers.

$ go test ./pkg/logs/... -v -run TestLogs -count 1

...
=== RUN   TestLogs/vmodule-level-3
    logs_test.go:291: FLAGS
        --vmodule=logs_test=3
    logs_test.go:295: STDOUT
        I1025 17:39:47.529791   62093 logs.go:161] "log Print" source="vcert"
        I1025 17:39:47.529998   62093 logs_test.go:43] "slog Info"
        I1025 17:39:47.530019   62093 logs_test.go:46] klog Info
        I1025 17:39:47.530031   62093 logs_test.go:49] "klog InfoS" key="value"
        I1025 17:39:47.530035   62093 logs_test.go:51] "Contextual Info Level 3" logger="foo" key="value"

    logs_test.go:296: STDERR
        W1025 17:39:47.530012   62093 logs_test.go:44] "slog Warn"
        E1025 17:39:47.530015   62093 logs_test.go:45] "slog Error"
        W1025 17:39:47.530023   62093 logs_test.go:47] klog Warning
        E1025 17:39:47.530027   62093 logs_test.go:48] "klog Error" err="fake-error"
        E1025 17:39:47.530039   62093 logs_test.go:52] "Contextual error" err="fake-error" logger="foo" key="value"

--- PASS: TestLogs (0.07s)
    --- PASS: TestLogs/help (0.01s)
    --- PASS: TestLogs/unrecognized-flag (0.01s)
    --- PASS: TestLogs/v-long-form-not-available (0.01s)
    --- PASS: TestLogs/logging-format-unrecognized (0.01s)
    --- PASS: TestLogs/original-defaults (0.01s)
    --- PASS: TestLogs/modified-defaults (0.01s)
    --- PASS: TestLogs/logging-format-json (0.01s)
    --- PASS: TestLogs/log-json-split-stream-false (0.01s)
    --- PASS: TestLogs/logging-format-text (0.01s)
    --- PASS: TestLogs/log-text-split-stream-false (0.01s)
    --- PASS: TestLogs/v-level-3 (0.01s)
    --- PASS: TestLogs/log-level-3 (0.01s)
    --- PASS: TestLogs/vmodule-level-3 (0.01s)
$ ./hack/e2e/test.sh
...
{
  "ts": 1729108594511.0774,
  "caller": "log/log.go:245",
  "msg": "successfully gathered 30 items from \"k8s/replicasets\" datagatherer",
  "v": 0
}
{
  "ts": 1729108594511.1047,
  "caller": "log/log.go:245",
  "msg": "Posting data to: ",
  "v": 0
}
{"ts":1729108595121.0984,"caller":"log/log.go:245","msg":"Data sent successfully.","v":0}

image

| grep -q "Data sent successfully"
--follow \
--namespace venafi \
| timeout 60 jq 'if .msg | test("Data sent successfully") then . | halt_error(0) end'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hack but it halts the pipeline when the expected msg value is observed and prints the matching record to stderr:

The timeout stops the pipeline after 60 seconds if the expected msg is not observed.

"k8s.io/component-base/featuregate"
"k8s.io/component-base/logs"
logsapi "k8s.io/component-base/logs/api/v1"
_ "k8s.io/component-base/logs/json/register"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing this module, is what makes the json log format option available.

@wallrj wallrj force-pushed the VC-35738/klog-3 branch 2 times, most recently from 411461d to 161382f Compare October 17, 2024 07:10
@james-w
Copy link

james-w commented Oct 17, 2024

Hi, does cert-manager and the other components use klog? Do they use the conventions that this MR is adopting?

@maelvls
Copy link
Member

maelvls commented Oct 18, 2024

Hey Richard, great job with adding unit tests around the logs. I really appreciate the effort. Thanks to you, I can just run go test -v to check what the various logging outputs are is great. Well done!

Does cert-manager and the other components use klog? Do they use the conventions that this MR is adopting?

To the question of consistency:

  • The library convention is klog for older projects such as cert-manager.
  • For newer projects, the convention is to use klogr as we prefer using structured logging
  • The default format convention seems to be klog's text format.

For that reason, the choice of klogr would make more sense in the Venafi Kubernetes Agent as it would bring structured logging. My first suggestion would be for us to start with klogr since klogr and klog don't have the same flags, and that would reduce the chance for people realizing they can't use --v anymore. We don't need to re-structure the logs right away, we can do that later (if ever).

Note that I found in kubernetes/klog@44eadc3 that slog is preferred over klogr since last year:

The klogr package doesn't get updated. It shouldn't be used anymore. To make that clearer, it gets marked as "deprecated".

My second suggestion, based on the fact that klogr is being phased out, would be to use std lib's log/slog and use klog as the backend. WDYT @wallrj?

Full picture:

project lib flag values default** supports json format
cert-manager klog -v, --v 0 to 9 info, 2 yes with --logging-format=json, default to klog's text format
cert-manager webhook klog -v, --v 0 to 9 info, 2 yes with --logging-format=json, default to klog's text format
openshift-routes klogr -v, --log-level 0 to 9 info, 1 no, uses klog's text format
csi-driver klogr -v, --log-level 0 to 9 info, 1 no, uses klog's text format
trust-manager klogr -v, --log-level 0 to 9 info, 1 yes with --log-format=json, defaults to klog's text format
approver-policy klogr -v, --log-level 0 to 9 info, 1 yes with --log-format=json, defaults to klog's text format
venafi-enhanced-issuer zap -zap-log-level debug, info, error, 0 to 9 info yes with --zap-encoder=json, defaults to zap's text format
Agent (before this PR) log + klog* none no, uses log's and klogs' text formats
Agent (with this PR) klog -v, --v 0 to 9 ? yes by default, can use klog's text format using --logging-format=text
venctl slog*** --log-level debug, info, warn, error info yes with --log-format=json, defaults to slog's text format
vcert log --verbose true, false false no, uses log's text format

* log is the standard library's log package. It is used for the agent's code, while klog is used by the client-go library that the agent uses.
** For example, "info, 4" means that the default severity is "info" and the default level for the "info" severity is 4. The level is always for the "info" level.
*** "slog" and "log" belongs to the standard library (log/slog).

@james-w
Copy link

james-w commented Oct 18, 2024

Hi, yep, consistency was exactly my concern, thanks for the detailed response. In my opinion it's favourable to have a smaller set to make updates easier down the line, even if that smaller set doesn't include the current best practice yet. If this project were to adopt the conventions of the majority of the other projects then we would have more consistency, and could figure out how/when to migrate them en-masse. Alternatively there may be an argument to update to the latest/greatest now, it depends on the details.

It sounds like you think that log/slog would be preferable to klog at this moment in this project?

@maelvls
Copy link
Member

maelvls commented Oct 18, 2024

@wallrj I've reviewed your PR and made some small adjustments:

  • In d9afd95, I moved everything to slog (it was pretty easy). We can later migrate every logs.Log.Printf with slog.Info.

  • In 33f5a12, I added assertions to the stderr and stdout outputs of each test case so that we can track the changes I have made when I moved to slog.

  • In d9afd95, I've added a "source" to indicate whether a log line originates from VCert or from the agent itself.

  • In 4428a24, I've made sure the logs.Log.Fatalf logs are showing with the ERROR severity in slog.

  • In 6752d49, I removed --v and replaced it with --log-level to match the long form of -v that most of our components uses (approver-policy, trust-manager, openshift-routes, and csi-driver). The help now looks like this:

    +  -v, --log-level Level         number for the log level verbosity
    -  -v, --v Level                 number for the log level verbosity

Regarding the flags:

It seems we (the cert-manager project) have initiated the convention of the flags -v and --log-level. It has nothing to do with klogr; klogr itself doesn't have a convention around the flag names. Example: https://github.com/cert-manager/openshift-routes/blob/0da68aed69ef66c8c9cefde45d855410813a6b3c/internal/cmd/app/options/options.go#L114-L116

I propose that we only expose the -v flag expose --log-level instead of --v since that's the long form flag that is the most common across our components (approver-policy, trust-manager, openshift-routes, and csi-driver).

Regarding the slog severity problem:

I gave up on Slog because it seemed to have unfamiliar severity / levels and I couldn't get the klog client-go logs to have the right levels with slog.

I've investigated and was able to configure an slog logger in d9afd95. To avoid lots of unrelated changes, the agent logs are still using the standard library's *log.Logger.

Regarding the output format:

I found that none of our projects currently output JSON by default. You mentioned the Kubernetes logging conventions, which say that the JSON format must be enabled with --logging-format, which means that if not passed, the output shouldn't be JSON:

Requires passing --logging-format=json to enable.

@wallrj IMO we should set the default output to "text"... WDYT? I haven't reverted to "text". I'll let you decide.

@wallrj Feel free to merge this PR once you are happy with it. I'll approve it now.

@maelvls maelvls force-pushed the VC-35738/klog-3 branch 3 times, most recently from a87bfd3 to 4428a24 Compare October 18, 2024 15:26
flags: "",
expectStdout: `
{"ts":0000000000000.000,"caller":"logs/logs.go:000","msg":"log Print","source":"vcert","v":0}
{"ts":0000000000000.000,"caller":"logs/logs_test.go:000","msg":"slog Info","v":0}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we were able to show the real source of the log line rather than the generic logs/logs_test.go. I guess we can improve this once we have migrated the logs.Log.Printf calls to slog.Info.

@wallrj wallrj changed the title [VC-35738] Log with klog to stdout and stderr in JSON format [VC-35738] Log with klog to stdout and stderr in Kubernetes text format Oct 25, 2024
@wallrj
Copy link
Member Author

wallrj commented Oct 25, 2024

@wallrj IMO we should set the default output to "text"... WDYT? I haven't reverted to "text". I'll let you decide.

I agree that text format by default is more consistent with other TLSPK components. Done.

@wallrj Feel free to merge this PR once you are happy with it. I'll approve it now.

I'll merge this after the 1.2.0 release because I'd like to finish the other logging improvements before releasing to customers.

// the agent, which still uses log.Printf.
slog := slog.Default()

Log = &log.Logger{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use slog.SetDefault(...) here?
Using slog.New(logr.ToSlogHandler(logger)) we can convert the klog logger to an slog logger.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I'll consider that in a follow up.

if f.Name == "logging-format" {
f.Usage = `Sets the log format. Permitted formats: "json", "text".`
}
if f.Name == "log-text-split-stream" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we splitting the log streams? Doesn't this make it hard to reconstruct the order or messages?

Copy link
Member Author

@wallrj wallrj Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my current understanding:

  1. I enabled log-json-split-stream by default to solve the problem that Google Logs Explorer otherwise can't distinguish between JSON formatted errors and info messages.

    Severities: By default, logs written to the standard output are on the INFO level and logs written to the standard error are on the ERROR level. Structured logs can include a severity field, which defines the log's severity.
    -- https://cloud.google.com/kubernetes-engine/docs/concepts/about-logs#best_practices

    (klog JSON format does not include a severity field).

  2. log-text-split-stream is not technically necessary to solve that problem because Google Logs Explorer seems to parse the E, W, and I prefix of the Kubernetes text format. So yes, I could remove this change.

  3. Kubernetes introduced the split-stream options to solve performance problems. Info and warn messages can be buffered and flushed periodically while errors are flushed immediately.

  4. By default logs are not buffered and are flushed to stdout and stderr immediately, so there shouldn't be any out-of-order log messages.

  5. If log buffering is enabled, then I think that the buffered info logs will be flushed whenever an error is logged, so there should be any out-of-order log messages in that case either.

  6. Kubernetes plan to make split-stream JSON logging the default for the Kubernetes components.

    Ability to route logs based on type/verbosity will be replaced with default splitting info and errors logs to stdout and stderr. We will make this change only in alternative logging formats (like JSON) as we don't want to introduce breaking change in default configuration. Splitting stream will allow treating info and errors with different priorities. It will unblock efforts like Make info logging non-blocking kubernetes/klog#209 to make info logs non-blocking.

    -- https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components#logging-defaults

wallrj and others added 10 commits October 29, 2024 09:33
For some reason, `assert.Equal` doesn't cause the test sub-process to
exit with 1. Instead, it returns 0 which means we can't notice the error
from the parent process.
That's useful to keep track what the logs look like, especially now that
I'm trying to switch from plain klog to slog.
Before, there was no way to know if a given log came from the VCert
library or from the agent itself.

I had to tweak logToSlogWriter's Writer to remove the extraneous "\n"
that log.Printf prints.
The exact logs output is now verified.

Signed-off-by: Richard Wall <[email protected]>
Signed-off-by: Richard Wall <[email protected]>
@wallrj wallrj changed the base branch from master to VC-35738/feature October 31, 2024 09:34
@wallrj wallrj merged commit 415d434 into VC-35738/feature Oct 31, 2024
2 checks passed
@wallrj wallrj deleted the VC-35738/klog-3 branch October 31, 2024 09:36
@wallrj wallrj mentioned this pull request Oct 31, 2024
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants