Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log install preflight errors/warnings in both the CLI and kotsadm pod #4046

Merged
merged 16 commits into from
Oct 9, 2023

Conversation

banjoh
Copy link
Member

@banjoh banjoh commented Sep 19, 2023

What this PR does / why we need it:

When deployments fail due to preflights having warnings/errors, users do not get to see which of the preflights failed. This change logs all the executed preflights whenever they prevent a deployment. We print all preflights so as to also have users see other preflights that got run so they can perform additional review of their preflights.

Screenshot of the CLI logs after preflights failing/warning
image

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Steps to reproduce

Does this PR introduce a user-facing change?

Log preflight checks to the CLI and kotsadm logs whenever there are checks that fail leading to a failed deployment

Does this PR require documentation?

@banjoh banjoh marked this pull request as draft September 19, 2023 13:29
@banjoh banjoh changed the title Em/log preflight errors Log install preflight errors/warnings in both the CLI and kotsadm pod Sep 19, 2023
@banjoh banjoh marked this pull request as ready for review September 19, 2023 18:29
@banjoh banjoh requested a review from cbodonnell September 19, 2023 18:30
@@ -47,6 +46,10 @@ import (
"k8s.io/client-go/kubernetes"
)

var client = &http.Client{
Copy link
Member Author

@banjoh banjoh Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, there was no timeout. Just being defensive here.

@@ -121,7 +126,7 @@ func execute(appID string, sequence int64, preflightSpec *troubleshootv1beta2.Pr
KubernetesRestConfig: restConfig,
}

logger.Debug("preflight collect phase")
logger.Info("preflight collect phase")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to info cause they show progress when performing an installation, which is critical functionality IMO to always log

// Log the preflight results if there are any warnings or errors
// The app may not get installed so we need to see this info for debugging
if GetPreflightState(uploadPreflightResults) != "pass" {
// TODO: Are there conditions when the application gets installed?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so here since we should only deploy if the preflight state is "pass" https://github.com/replicatedhq/kots/blob/main/pkg/preflight/preflight.go#L236-L245

@@ -444,11 +448,22 @@ func InstallCmd() *cobra.Command {
switch status {
case storetypes.VersionPendingPreflight:
log.ActionWithSpinner("Waiting for preflight checks to complete")
log.FinishSpinner()
Copy link
Contributor

@cbodonnell cbodonnell Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave the spinner going while we wait for the preflights and finish with success or error afterwards? Otherwise we end up with a state like this while preflights are running and a user may think things are hanging or stuck:

  • Validating registry information ✓  
  • Deploying Admin Console
    • Creating namespace ✓  
    • Waiting for datastore to be ready ✓  
  • Waiting for Admin Console to be ready ✓  
  • Waiting for installation to complete ✓  
  • Waiting for preflight checks to complete ✓

if errors.As(err, &perr) {
cmd.SilenceErrors = true
var s strings.Builder
s.WriteString("\nPreflight check results (state - title - message)")
Copy link
Contributor

@cbodonnell cbodonnell Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a pkg/print where have logic for pretty-printing various things. I think pkg/print/config.go could be a good reference since those print config validation errors, which might be similar to something like preflight warnings/errors. This logic could live somewhere in that package.

return errors.Wrap(err, "failed to validate preflight results")
log.Errorf("Preflight checks contain warnings or errors. The application was not deployed")
perr := preflightError{}
if errors.As(err, &perr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any benefit to handling specific error types this way vs. how do it elsewhere in the codebase? For example: https://github.com/replicatedhq/kots/blob/main/pkg/handlers/snapshots.go#L202

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any benefit to handling specific error types this way vs. how do it elsewhere in the codebase? For example: https://github.com/replicatedhq/kots/blob/main/pkg/handlers/snapshots.go#L202

They both achieve the same result I think. errors.Cause() is from github.com/pkg/errors library while errors.As() is part of go standard library

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if they achieve the same thing, let's just stick to errors.Cause() for consistency?

@@ -194,7 +208,7 @@ func Run(appID string, appSlug string, sequence int64, isAirgap bool, archiveDir
// preflight reporting
if isDeployed {
if err := reporting.WaitAndReportPreflightChecks(appID, sequence, false, false); err != nil {
logger.Debugf("failed to send preflights data to replicated app: %v", err)
logger.Errorf("failed to send preflights data to replicated app: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure this message is always accurate anymore given the work to support airgap reporting. since we're exposing it as a error log with this change, how about we change the message to something more generic like:

failed to report preflight check results

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure this message is always accurate anymore given the work to support airgap reporting. since we're exposing it as a error log with this change, how about we change the message to something more generic like:

failed to report preflight check results

I've seen a number of these debug logs all over the code. I think I'll revert this change for now and leave it as is. Later on we can align the logs.

@@ -426,6 +429,7 @@ func InstallCmd() *cobra.Command {
case err := <-errChan:
if err != nil {
log.Error(err)
// TODO: Why is this a negative exit code?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question... I think this end up resulting in a 255 exit code:

$ echo $?
255

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's fix this and set it to 255

log.Errorf("Preflight checks contain warnings or errors. The application was not deployed")
perr := preflightError{}
if errors.As(err, &perr) {
cmd.SilenceErrors = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to silence errors if we're always going to call os.Exit(1) at the end of this code path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to silence errors if we're always going to call os.Exit(1) at the end of this code path?

Perhaps not. Removing it

cmd/kots/cli/install.go Outdated Show resolved Hide resolved
}
log.FinishSpinner()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should leave this as we need to stop the spinner in the case where preflights pass too. Otherwise we are left with an indefinite spinner:

  • Deploying Admin Console
    • Creating namespace ✓  
    • Waiting for datastore to be ready ✓  
  • Waiting for Admin Console to be ready ✓  
  • Waiting for installation to complete ✓  
  • Waiting for preflight checks to complete ⠇
  • Press Ctrl+C to exit
  • Go to http://localhost:8800 to access the Admin Console

  • Waiting for preflight checks to complete ⠏

@@ -216,6 +229,21 @@ func Run(appID string, appSlug string, sequence int64, isAirgap bool, archiveDir
return nil
}

func preflightState(p troubleshootpreflight.UploadPreflightResult) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use the existing GetPreflightState function`?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state from GetPreflightState is of the entire list of preflights i.e if there is an error in any, the state == error. preflightState gets the state of each preflight for logging purposes.

log.Info(s.String())
}

func preflightState(r *preflight.UploadPreflightResult) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member Author

@banjoh banjoh Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll DRY this out in favour of preflights.GetPreflightCheckState which I'll rename from the existing preflights.preflightState

@@ -26,3 +27,26 @@ func ConfigValidationErrors(log *logger.CLILogger, groupValidationErrors []confi
log.FinishSpinnerWithError()
log.Errorf(sb.String())
}

func PreflightErrors(log *logger.CLILogger, results []*preflight.UploadPreflightResult) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not following the existing pattern of printing?

func printRestoresTable(restores []velerov1.Restore) {
w := NewTabWriter()
defer w.Flush()
fmtColumns := "%s\t%s\t%s\t%s\t%s\t%s\t%s\n"
fmt.Fprintf(w, fmtColumns, "NAME", "BACKUP", "STATUS", "STARTED", "COMPLETED", "ERRORS", "WARNINGS")
for _, r := range restores {
var startedAt *time.Time
if r.Status.StartTimestamp != nil && !r.Status.StartTimestamp.Time.IsZero() {
startedAt = &r.Status.StartTimestamp.Time
}
var completedAt *time.Time
if r.Status.CompletionTimestamp != nil && !r.Status.CompletionTimestamp.Time.IsZero() {
completedAt = &r.Status.CompletionTimestamp.Time
}
phase := r.Status.Phase
if phase == "" {
phase = "New"
}
fmt.Fprintf(w, fmtColumns, r.ObjectMeta.Name, r.Spec.BackupName, phase, startedAt, completedAt, fmt.Sprintf("%d", r.Status.Errors), fmt.Sprintf("%d", r.Status.Warnings))
}
}

that would help align the column titles with the values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not following the existing pattern of printing?

No particular reason. I'll try to adopt the same approach

perr := preflightError{}
if errors.As(err, &perr) {
print.PreflightErrors(log, perr.Results)
cmd.SilenceErrors = true // Stop Cobra from printing the error, we print it ourselves
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of silencing the errors, can we just return/print a generic error message?

Copy link
Member Author

@banjoh banjoh Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of silencing the errors, can we just return/print a generic error message?

That extra duplicated line after the formatted preflight check results at the end did not feel right. This is how it ends up looking like

evans@evans-vm3:~$ ./kots install thanos-reloaded -n default --shared-password password --license-file ~/thanos_license.yaml --kubeconfig /etc/rancher/k3s/k3s.yaml
  • Deploying Admin Console
    • Creating namespace ✓
    • Waiting for datastore to be ready ✓
  • Waiting for Admin Console to be ready ✓
  • Waiting for installation to complete ✓
  • Waiting for preflight checks to complete ✓
  • There are preflight check warnings for the application. The app was not deployed.

STATE    TITLE                          MESSAGE
PASS     Required Kubernetes Version    Your cluster meets the recommended and required versions of Kubernetes.
PASS     Container Runtime              containerd container runtime was found.
WARN     Kubernetes Distribution        Unable to determine the distribution of Kubernetes.
WARN     Total CPU Cores                The cluster should contain at least 4 cores.

Error: There are preflight check warnings for the application. The app was not deployed.

NOTE: Any other errors will get printed, we just silence the errors when we format the preflight checks table since we are printing our own error message.

Copy link
Member Author

@banjoh banjoh Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is as an image to show the colours

image

@banjoh banjoh merged commit d0ba98c into main Oct 9, 2023
@banjoh banjoh deleted the em/log-preflight-errors branch October 9, 2023 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants