Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KUBESAW-187: Adjust ksctl adm restart command to use rollout-restart #79

Merged
merged 52 commits into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
2748b57
KUBESAW-187: Adjust ksctl adm restart command to use rollout-restart
fbm3307 Sep 10, 2024
da57803
some checking
fbm3307 Sep 13, 2024
aeef8de
Merge branch 'master' into kubesaw170_restart
fbm3307 Sep 13, 2024
f2c29ee
golint
fbm3307 Sep 13, 2024
c742197
Merge branch 'master' into kubesaw170_restart
fbm3307 Sep 13, 2024
ba8866e
few changes to the logic
fbm3307 Sep 17, 2024
ad5348e
Merge branch 'kubesaw170_restart' of https://github.com/fbm3307/ksctl…
fbm3307 Sep 17, 2024
cd4b1bf
t cases
fbm3307 Sep 17, 2024
4cd8e26
Merge branch 'master' into kubesaw170_restart
MatousJobanek Sep 18, 2024
90f7d40
Merge branch 'kubesaw170_restart' of https://github.com/fbm3307/ksctl…
fbm3307 Sep 19, 2024
4c15cf0
eview comments
fbm3307 Sep 19, 2024
8796901
Review comments
fbm3307 Sep 19, 2024
1d68d34
check the args
fbm3307 Sep 19, 2024
47bc27e
adding unit test cases
fbm3307 Sep 23, 2024
8f56cbf
Change in test cases
fbm3307 Sep 25, 2024
9e8fc49
Merge branch 'master' into kubesaw170_restart
fbm3307 Sep 25, 2024
92d0237
minor change in unit test
fbm3307 Sep 25, 2024
c0332b1
unregister-member test
fbm3307 Sep 25, 2024
83e99b5
unit test case for restart
fbm3307 Sep 25, 2024
d5e5280
test case for delete
fbm3307 Sep 25, 2024
b6f3df1
Rc1
fbm3307 Sep 26, 2024
51e1e4e
golint
fbm3307 Sep 27, 2024
f2c234e
Merge branch 'master' into kubesaw170_restart
mfrancisc Sep 30, 2024
f5c19de
Merge branch 'master' into kubesaw170_restart
fbm3307 Oct 1, 2024
f3cf690
changes to the logic of restart
fbm3307 Oct 10, 2024
1868b12
Merge branch 'master' into kubesaw170_restart
fbm3307 Oct 11, 2024
2d4d4b1
Merge branch 'master' into kubesaw170_restart
fbm3307 Oct 17, 2024
1f5db06
Merge branch 'master' into kubesaw170_restart
fbm3307 Nov 4, 2024
fcf67b6
review comments-2
fbm3307 Nov 4, 2024
fd143c7
restart-test changes
fbm3307 Nov 5, 2024
b823e10
CI
fbm3307 Nov 6, 2024
97997f6
golang ci
fbm3307 Nov 6, 2024
e34b110
adding tc
fbm3307 Nov 7, 2024
144dd2c
some addition to test cases
fbm3307 Nov 7, 2024
0d80548
some changes
fbm3307 Nov 7, 2024
096d49a
adding some comments
fbm3307 Nov 8, 2024
bf63303
autoscalling buffer test case
fbm3307 Nov 8, 2024
09411ad
Modification of test cases
fbm3307 Nov 12, 2024
857fdc9
Go lint
fbm3307 Nov 12, 2024
760cf0c
Test case of status
fbm3307 Nov 12, 2024
3517338
Linter
fbm3307 Nov 12, 2024
2704a91
test of unregister_member
fbm3307 Nov 12, 2024
17da571
phase-3 rc
fbm3307 Nov 14, 2024
a4b5198
code cov
fbm3307 Nov 14, 2024
9b889cc
some changes to status func
fbm3307 Nov 14, 2024
9a14e2b
leftovers
fbm3307 Nov 14, 2024
6f5b0cb
Merge branch 'master' into kubesaw170_restart
fbm3307 Nov 15, 2024
4f477ce
merge conflict
fbm3307 Nov 15, 2024
6318f4e
some changes as per rc
fbm3307 Nov 21, 2024
8762ebc
go version fix
fbm3307 Nov 21, 2024
9c4ae9e
extra left overs
fbm3307 Nov 21, 2024
70c53e7
linter
fbm3307 Nov 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ require (
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/liggitt/tabwriter v0.0.0-20181228230101-89fcab3d43de // indirect
github.com/lithammer/dedent v1.1.0 // indirect
github.com/lucasb-eyer/go-colorful v1.2.0 // indirect
github.com/mailru/easyjson v0.7.6 // indirect
github.com/mattn/go-isatty v0.0.18 // indirect
Expand Down
1 change: 1 addition & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -439,6 +439,7 @@ github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+
github.com/leodido/go-urn v1.2.0/go.mod h1:+8+nEpDfqqsY+g338gtMEUOtuK+4dEMhiQEgxpxOKII=
github.com/liggitt/tabwriter v0.0.0-20181228230101-89fcab3d43de h1:9TO3cAIGXtEhnIaL+V+BEER86oLrvS+kWobKpbJuye0=
github.com/liggitt/tabwriter v0.0.0-20181228230101-89fcab3d43de/go.mod h1:zAbeS9B/r2mtpb6U+EI2rYA5OAXxsYw6wTamcNW+zcE=
github.com/lithammer/dedent v1.1.0 h1:VNzHMVCBNG1j0fh3OrsFRkVUwStdDArbgBWoPAffktY=
github.com/lithammer/dedent v1.1.0/go.mod h1:jrXYCQtgg0nJiN+StA2KgR7w6CiQNv9Fd/Z9BP0jIOc=
github.com/lucasb-eyer/go-colorful v1.2.0 h1:1nnpGOrhyZZuNyfu1QjKiUICQ74+3FNCN69Aj6K7nkY=
github.com/lucasb-eyer/go-colorful v1.2.0/go.mod h1:R4dSotOR9KMtayYi1e77YzuveK+i7ruzyGqttikkLy0=
Expand Down
229 changes: 136 additions & 93 deletions pkg/cmd/adm/restart.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,155 +3,198 @@ package adm
import (
"context"
"fmt"
"time"
"os"

"github.com/kubesaw/ksctl/pkg/client"
"github.com/kubesaw/ksctl/pkg/cmd/flags"
"github.com/kubesaw/ksctl/pkg/configuration"
clicontext "github.com/kubesaw/ksctl/pkg/context"
"github.com/kubesaw/ksctl/pkg/ioutils"

"github.com/spf13/cobra"
appsv1 "k8s.io/api/apps/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/wait"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/cli-runtime/pkg/genericclioptions"
kubectlrollout "k8s.io/kubectl/pkg/cmd/rollout"
cmdutil "k8s.io/kubectl/pkg/cmd/util"
runtimeclient "sigs.k8s.io/controller-runtime/pkg/client"
)

// NewRestartCmd() is a function to restart the whole operator, it relies on the target cluster and fetches the cluster config
// 1. If the command is run for host operator, it restart the whole host operator.(it deletes olm based pods(host-operator pods),
// waits for the new deployment to come up, then uses rollout-restart command for non-olm based - registration-service)
// 2. If the command is run for member operator, it restart the whole member operator.(it deletes olm based pods(member-operator pods),
// waits for the new deployment to come up, then uses rollout-restart command for non-olm based deployments - webhooks)
fbm3307 marked this conversation as resolved.
Show resolved Hide resolved
func NewRestartCmd() *cobra.Command {
var targetCluster string
command := &cobra.Command{
Use: "restart -t <cluster-name> <deployment-name>",
Short: "Restarts a deployment",
Long: `Restarts the deployment with the given name in the operator namespace.
If no deployment name is provided, then it lists all existing deployments in the namespace.`,
Use: "restart <cluster-name>",
Short: "Restarts an operator",
Long: `Restarts the whole operator in the given cluster name.
It restarts the operator and checks the status of the deployment`,
Args: cobra.RangeArgs(0, 1),
RunE: func(cmd *cobra.Command, args []string) error {
term := ioutils.NewTerminal(cmd.InOrStdin, cmd.OutOrStdout)
ctx := clicontext.NewCommandContext(term, client.DefaultNewClient)
return restart(ctx, targetCluster, args...)
return restart(ctx, args...)
},
}
command.Flags().StringVarP(&targetCluster, "target-cluster", "t", "", "The target cluster")
flags.MustMarkRequired(command, "target-cluster")
return command
}

func restart(ctx *clicontext.CommandContext, clusterName string, deployments ...string) error {
func restart(ctx *clicontext.CommandContext, clusterNames ...string) error {
if clusterNames == nil {
return fmt.Errorf("please provide a cluster name to restart the operator e.g `ksctl adm restart host`")
}
clusterName := clusterNames[0]
fbm3307 marked this conversation as resolved.
Show resolved Hide resolved
kubeConfigFlags := genericclioptions.NewConfigFlags(true).WithDeprecatedPasswordFlag()
factory := cmdutil.NewFactory(cmdutil.NewMatchVersionFlags(kubeConfigFlags))
ioStreams := genericclioptions.IOStreams{
fbm3307 marked this conversation as resolved.
Show resolved Hide resolved
In: os.Stdin,
Out: os.Stdout,
ErrOut: os.Stderr,
}
kubeConfigFlags.ClusterName = nil // `cluster` flag is redefined for our own purpose
kubeConfigFlags.AuthInfoName = nil // unused here, so we can hide it
kubeConfigFlags.Context = nil // unused here, so we can hide it

cfg, err := configuration.LoadClusterConfig(ctx, clusterName)
if err != nil {
return err
}
cl, err := ctx.NewClient(cfg.Token, cfg.ServerAPI)
kubeConfigFlags.Namespace = &cfg.OperatorNamespace
kubeConfigFlags.APIServer = &cfg.ServerAPI
kubeConfigFlags.BearerToken = &cfg.Token
kubeconfig, err := client.EnsureKsctlConfigFile()
if err != nil {
return err
}
kubeConfigFlags.KubeConfig = &kubeconfig

if len(deployments) == 0 {
err := printExistingDeployments(ctx.Terminal, cl, cfg.OperatorNamespace)
if err != nil {
ctx.Terminal.Printlnf("\nERROR: Failed to list existing deployments\n :%s", err.Error())
}
return fmt.Errorf("at least one deployment name is required, include one or more of the above deployments to restart")
cl, err := ctx.NewClient(cfg.Token, cfg.ServerAPI)

if err != nil {
return err
}
deploymentName := deployments[0]

if !ctx.AskForConfirmation(
ioutils.WithMessagef("restart the deployment '%s' in namespace '%s'", deploymentName, cfg.OperatorNamespace)) {
ioutils.WithMessagef("restart the '%s' operator in namespace '%s'", clusterName, cfg.OperatorNamespace)) {
fbm3307 marked this conversation as resolved.
Show resolved Hide resolved
return nil
}
return restartDeployment(ctx, cl, cfg.OperatorNamespace, deploymentName)

return restartDeployment(ctx, cl, cfg.OperatorNamespace, factory, ioStreams)
}

func restartDeployment(ctx *clicontext.CommandContext, cl runtimeclient.Client, ns string, deploymentName string) error {
namespacedName := types.NamespacedName{
Namespace: ns,
Name: deploymentName,
}
func restartDeployment(ctx *clicontext.CommandContext, cl runtimeclient.Client, ns string, f cmdutil.Factory, ioStreams genericclioptions.IOStreams) error {
fmt.Printf("Fetching the current OLM and non-OLM deployments of the operator in %s", ns)

originalReplicas, err := scaleToZero(cl, namespacedName)
olmDeploymentList, nonOlmDeploymentlist, err := getExistingDeployments(cl, ns)
if err != nil {
if apierrors.IsNotFound(err) {
ctx.Printlnf("\nERROR: The given deployment '%s' wasn't found.", deploymentName)
return printExistingDeployments(ctx, cl, ns)
}
return err
}
ctx.Println("The deployment was scaled to 0")
if err := scaleBack(ctx, cl, namespacedName, originalReplicas); err != nil {
ctx.Printlnf("Scaling the deployment '%s' in namespace '%s' back to '%d' replicas wasn't successful", originalReplicas)
ctx.Println("Please, try to contact administrators to scale the deployment back manually")
return err

if olmDeploymentList == nil {
return fmt.Errorf("OLM based deployment not found in %s", ns)
fbm3307 marked this conversation as resolved.
Show resolved Hide resolved
}
for _, olmDeployment := range olmDeploymentList.Items {
fmt.Printf("Proceeding to delete the Pods of %v", olmDeployment)

ctx.Printlnf("The deployment was scaled back to '%d'", originalReplicas)
if err := deletePods(ctx, cl, olmDeployment, f, ioStreams); err != nil {
return err
}
}
if nonOlmDeploymentlist != nil {
for _, nonOlmDeployment := range nonOlmDeploymentlist.Items {

fmt.Printf("Proceeding to restart the non-OLM deployment %v", nonOlmDeployment)

if err := restartNonOlmDeployments(nonOlmDeployment, f, ioStreams); err != nil {
return err
}
//check the rollout status
fmt.Printf("Checking the status of the rolled out deployment %v", nonOlmDeployment)
if err := checkRolloutStatus(f, ioStreams, "provider=codeready-toolchain"); err != nil {
return err
}
}
} else {
fmt.Printf("non-OLM based deployment not found in %s", ns)
}
return nil
}

func restartHostOperator(ctx *clicontext.CommandContext, hostClient runtimeclient.Client, hostNamespace string) error {
deployments := &appsv1.DeploymentList{}
if err := hostClient.List(context.TODO(), deployments,
runtimeclient.InNamespace(hostNamespace),
runtimeclient.MatchingLabels{"olm.owner.namespace": "toolchain-host-operator"}); err != nil {
func deletePods(ctx *clicontext.CommandContext, cl runtimeclient.Client, deployment appsv1.Deployment, f cmdutil.Factory, ioStreams genericclioptions.IOStreams) error {
fmt.Printf("Listing the pods to be deleted")
//get pods by label selector from the deployment
pods := corev1.PodList{}
selector, _ := metav1.LabelSelectorAsSelector(deployment.Spec.Selector)
if err := cl.List(ctx, &pods,
runtimeclient.MatchingLabelsSelector{Selector: selector},
runtimeclient.InNamespace(deployment.Namespace)); err != nil {
return err
}
if len(deployments.Items) != 1 {
return fmt.Errorf("there should be a single deployment matching the label olm.owner.namespace=toolchain-host-operator in %s ns, but %d was found. "+
"It's not possible to restart the Host Operator deployment", hostNamespace, len(deployments.Items))
fmt.Printf("Starting to delete the pods")
//delete pods
for _, pod := range pods.Items {
pod := pod // TODO We won't need it after upgrading to go 1.22: https://go.dev/blog/loopvar-preview
if err := cl.Delete(ctx, &pod); err != nil {
return err
}
}

return restartDeployment(ctx, hostClient, hostNamespace, deployments.Items[0].Name)
}

func printExistingDeployments(term ioutils.Terminal, cl runtimeclient.Client, ns string) error {
deployments := &appsv1.DeploymentList{}
if err := cl.List(context.TODO(), deployments, runtimeclient.InNamespace(ns)); err != nil {
fmt.Printf("Checking the status of the rolled out deployment %v", deployment)
//check the rollout status
if err := checkRolloutStatus(f, ioStreams, "kubesaw-control-plane=kubesaw-controller-manager"); err != nil {
return err
}
deploymentList := "\n"
for _, deployment := range deployments.Items {
deploymentList += fmt.Sprintf("%s\n", deployment.Name)
}
term.PrintContextSeparatorWithBodyf(deploymentList, "Existing deployments in %s namespace", ns)
return nil

}

func scaleToZero(cl runtimeclient.Client, namespacedName types.NamespacedName) (int32, error) {
// get the deployment
deployment := &appsv1.Deployment{}
if err := cl.Get(context.TODO(), namespacedName, deployment); err != nil {
return 0, err
func restartNonOlmDeployments(deployment appsv1.Deployment, f cmdutil.Factory, ioStreams genericclioptions.IOStreams) error {

o := kubectlrollout.NewRolloutRestartOptions(ioStreams)

if err := o.Complete(f, nil, []string{"deployment"}); err != nil {
panic(err)
fbm3307 marked this conversation as resolved.
Show resolved Hide resolved
}
// keep original number of replicas so we can bring it back
originalReplicas := *deployment.Spec.Replicas
zero := int32(0)
deployment.Spec.Replicas = &zero

// update the deployment so it scales to zero
return originalReplicas, cl.Update(context.TODO(), deployment)
o.Resources = []string{"deployment/" + deployment.Name}

if err := o.Validate(); err != nil {
panic(err)
}
fmt.Printf("Running the rollout restart command for non-olm deployment %v", deployment)
return o.RunRestart()
}

func scaleBack(term ioutils.Terminal, cl runtimeclient.Client, namespacedName types.NamespacedName, originalReplicas int32) error {
return wait.Poll(500*time.Millisecond, 10*time.Second, func() (done bool, err error) {
term.Println("")
term.Printlnf("Trying to scale the deployment back to '%d'", originalReplicas)
// get the updated
deployment := &appsv1.Deployment{}
if err := cl.Get(context.TODO(), namespacedName, deployment); err != nil {
return false, err
}
// check if the replicas number wasn't already reset by a controller
if *deployment.Spec.Replicas == originalReplicas {
return true, nil
}
// set the original
deployment.Spec.Replicas = &originalReplicas
// and update to scale back
if err := cl.Update(context.TODO(), deployment); err != nil {
term.Printlnf("error updating Deployment '%s': %s. Will retry again...", namespacedName.Name, err.Error())
return false, nil
}
return true, nil
})
func checkRolloutStatus(f cmdutil.Factory, ioStreams genericclioptions.IOStreams, labelSelector string) error {
fbm3307 marked this conversation as resolved.
Show resolved Hide resolved
cmd := kubectlrollout.NewRolloutStatusOptions(ioStreams)

if err := cmd.Complete(f, []string{"deployment"}); err != nil {
panic(err)
}
cmd.LabelSelector = labelSelector
fbm3307 marked this conversation as resolved.
Show resolved Hide resolved
if err := cmd.Validate(); err != nil {
panic(err)
fbm3307 marked this conversation as resolved.
Show resolved Hide resolved
}
fmt.Printf("Running the Rollout status to check the status of the deployment")
return cmd.Run()
}

func getExistingDeployments(cl runtimeclient.Client, ns string) (*appsv1.DeploymentList, *appsv1.DeploymentList, error) {

olmDeployments := &appsv1.DeploymentList{}
if err := cl.List(context.TODO(), olmDeployments,
runtimeclient.InNamespace(ns),
runtimeclient.MatchingLabels{"kubesaw-control-plane": "kubesaw-controller-manager"}); err != nil {
mfrancisc marked this conversation as resolved.
Show resolved Hide resolved
return nil, nil, err
}

nonOlmDeployments := &appsv1.DeploymentList{}
if err := cl.List(context.TODO(), nonOlmDeployments,
runtimeclient.InNamespace(ns),
runtimeclient.MatchingLabels{"provider": "codeready-toolchain"}); err != nil {
mfrancisc marked this conversation as resolved.
Show resolved Hide resolved
fbm3307 marked this conversation as resolved.
Show resolved Hide resolved
return nil, nil, err
}

return olmDeployments, nonOlmDeployments, nil
}
Loading
Loading