Skip to content

Commit

Permalink
Clean up auto-generated resources in leader and member clusters
Browse files Browse the repository at this point in the history
1. Split existing stale controller into leader and member folders.
2. The new stale controller in the leader cluster will do following things:
  * Check MemberClusterAnnounce periodically in the leader cluster and
    delete the stale CR if its last timestamp annotation `touch-ts` is over 24 hours.
  * Clean up all corresponding ResourceExports when a MemberClusterAnnounce is deleted.
  * Clean up stale ResourceExports if there is no existing MemberClusterAnnounce when
    the controller is started.
  * Clean up all MemberClusterAnnounces and corresponding ResourceExports when the controller
    is started with no ClusterSet CR.
3. The ClusterSet controller in leader will remove all remaining ResourceExports and
   MemberClusterAnnounces when the ClusterSet CR is deleted in the leader cluster.
4. The new stale controller in the member cluster will do following things:
  * Clean up any stale resources when the ClusterSet is ready.
  * Clean up any stale resources when the controller is restarted.
5. The ClusterSet controller will be responsible to remove all imported and exported
   resources for the member cluster when the ClusterSet CR is deleted.

Signed-off-by: Lan Luo <[email protected]>
  • Loading branch information
luolanzone committed Oct 26, 2023
1 parent 62a440a commit bc26130
Show file tree
Hide file tree
Showing 28 changed files with 1,319 additions and 472 deletions.
73 changes: 71 additions & 2 deletions docs/multicluster/user-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,25 @@
- [Installation](#installation)
- [Preparation](#preparation)
- [Deploy Antrea Multi-cluster Controller](#deploy-antrea-multi-cluster-controller)
- [Deploy in a Dedicated Leader Cluster](#deploy-in-a-dedicated-leader-cluster)
- [Deploy in a Member Cluster](#deploy-in-a-member-cluster)
- [Deploy Leader and Member in One Cluster](#deploy-leader-and-member-in-one-cluster)
- [Create ClusterSet](#create-clusterset)
- [Set up Access to Leader Cluster](#set-up-access-to-leader-cluster)
- [Initialize ClusterSet](#initialize-clusterset)
- [Initialize ClusterSet for a Dual-role Cluster](#initialize-clusterset-for-a-dual-role-cluster)
- [Multi-cluster Gateway Configuration](#multi-cluster-gateway-configuration)
- [Multi-cluster WireGuard Encryption](#multi-cluster-wireguard-encryption)
- [Multi-cluster Service](#multi-cluster-service)
- [Multi-cluster Pod to Pod Connectivity](#multi-cluster-pod-to-pod-connectivity)
- [Multi-cluster Pod-to-Pod Connectivity](#multi-cluster-pod-to-pod-connectivity)
- [Multi-cluster NetworkPolicy](#multi-cluster-networkpolicy)
- [Egress Rule to Multi-cluster Services](#egress-rule-to-multi-cluster-service)
- [Egress Rule to Multi-cluster Service](#egress-rule-to-multi-cluster-service)
- [Ingress Rule](#ingress-rule)
- [ClusterNetworkPolicy Replication](#clusternetworkpolicy-replication)
- [Build Antrea Multi-cluster Controller Image](#build-antrea-multi-cluster-controller-image)
- [Uninstallation](#uninstallation)
- [Remove a Member Cluster](#remove-a-member-cluster)
- [Remove a Leader Cluster](#remove-a-leader-cluster)
- [Known Issue](#known-issue)
<!-- /toc -->

Expand Down Expand Up @@ -812,6 +821,66 @@ the image.
3. Copy the image file `antrea-mcs.tar` to the Nodes of your local cluster.
4. Run `docker load < antrea-mcs.tar` in each Node of your local cluster.

## Uninstallation

### Remove a Member Cluster

If you want to remove a member cluster from a ClusterSet and uninstall Antrea
Multi-cluster, please follow the following steps.

Note: please replace `kube-system` with the right Namespace in the example
commands and manifest if Antrea Multi-cluster is not deployed in
the default Namespace.

1. Delete all ServiceExports and the Multi-cluster Gateway annotation on the
Gateway Nodes.

2. Delete the ClusterSet CR. Antrea Multi-cluster Controller will be
responsible for cleaning up all resources created by itself automatically.

3. Delete the Antrea Multi-cluster Deployment:

```bash
kubectl delete -f https://github.com/antrea-io/antrea/releases/download/$TAG/antrea-multicluster-member.yml
```

### Remove a Leader Cluster

If you want to delete a ClusterSet and uninstall Antrea Multi-cluster in
a leader cluster, please follow the following steps. You should first
[remove all member clusters](#remove-a-member-cluster) before removing
a leader cluster from a ClusterSet.

Note: please replace `antrea-multicluster` with the right Namespace in the
following example commands and manifest if Antrea Multi-cluster is not
deployed in the default Namespace.

1. Delete AntreaClusterNetworkPolicy ResourceExports in the leader cluster.

2. Verify that there is no remaining MemberClusterAnnounces.

```bash
kubectl get memberclusterannounce -n antrea-multicluster
```

3. Delete the ClusterSet CR. Antrea Multi-cluster Controller will be
responsible for cleaning up all resources created by itself automatically.

4. Check there is no remaining ResourceExports and ResourceImports:

```bash
kubectl get resourceexports -n antrea-multicluster
kubectl get resourceimports -n antrea-multicluster
```

Note: you can follow the [Known Issue section](#known-issue) to delete the left-over ResourceExports.

5. Delete the Antrea Multi-cluster Deployment:

```bash
kubectl delete -f https://github.com/antrea-io/antrea/releases/download/$TAG/antrea-multicluster-leader.yml
```

## Known Issue

We recommend user to redeploy or update Antrea Multi-cluster Controller through
Expand Down
1 change: 0 additions & 1 deletion hack/.notableofcontents
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ docs/multicluster/architecture.md
docs/multicluster/policy-only-mode.md
docs/multicluster/quick-start.md
docs/multicluster/upgrade.md
docs/multicluster/user-guide.md
docs/network-requirements.md
docs/noencap-hybrid-modes.md
docs/octant-plugin-installation.md
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,6 @@ import (

//+kubebuilder:webhook:path=/validate-multicluster-crd-antrea-io-v1alpha2-clusterset,mutating=false,failurePolicy=fail,sideEffects=None,groups=multicluster.crd.antrea.io,resources=clustersets,verbs=create;update;delete,versions=v1alpha2,name=vclusterset.kb.io,admissionReviewVersions={v1,v1beta1}

const (
mcControllerSAName = "antrea-mc-controller"
)

// ClusterSet validator
type clusterSetValidator struct {
Client client.Client
Expand Down
3 changes: 2 additions & 1 deletion multicluster/cmd/multicluster-controller/gateway_webhook.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ import (
)

const (
antreaAgentSAName = "antrea-agent"
antreaAgentSAName = "antrea-agent"
mcControllerSAName = "antrea-mc-controller"
)

//+kubebuilder:webhook:path=/validate-multicluster-crd-antrea-io-v1alpha1-gateway,mutating=false,failurePolicy=fail,sideEffects=None,groups=multicluster.crd.antrea.io,resources=gateways,verbs=create;update,versions=v1alpha1,name=vgateway.kb.io,admissionReviewVersions={v1,v1beta1}
Expand Down
12 changes: 5 additions & 7 deletions multicluster/cmd/multicluster-controller/leader.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ import (
"sigs.k8s.io/controller-runtime/pkg/webhook"

multiclusterv1alpha1 "antrea.io/antrea/multicluster/apis/multicluster/v1alpha1"
multiclustercontrollers "antrea.io/antrea/multicluster/controllers/multicluster"
"antrea.io/antrea/multicluster/controllers/multicluster/leader"
"antrea.io/antrea/pkg/log"
"antrea.io/antrea/pkg/signals"
Expand Down Expand Up @@ -114,15 +113,14 @@ func runLeader(o *Options) error {
return fmt.Errorf("error creating ResourceExport webhook: %v", err)
}

staleController := multiclustercontrollers.NewStaleResCleanupController(
staleController := leader.NewStaleResCleanupController(
mgr.GetClient(),
mgr.GetScheme(),
env.GetPodNamespace(),
nil,
multiclustercontrollers.LeaderCluster,
)

go staleController.Run(stopCh)
if err = staleController.SetupWithManager(mgr); err != nil {
return fmt.Errorf("error creating StaleResCleanupController: %v", err)
}
go staleController.RunPeriodically(stopCh)

klog.InfoS("Leader MC Controller Starting Manager")
if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
Expand Down
2 changes: 2 additions & 0 deletions multicluster/cmd/multicluster-controller/leader_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ import (
"k8s.io/client-go/rest"
"k8s.io/klog/v2"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/cache/informertest"
"sigs.k8s.io/controller-runtime/pkg/client/fake"
"sigs.k8s.io/controller-runtime/pkg/config/v1alpha1"
"sigs.k8s.io/controller-runtime/pkg/webhook"
Expand Down Expand Up @@ -59,6 +60,7 @@ func initMockManager(mockManager *mocks.MockManager) {
mockManager.EXPECT().Start(gomock.Any()).Return(nil).AnyTimes()
mockManager.EXPECT().GetConfig().Return(&rest.Config{}).AnyTimes()
mockManager.EXPECT().GetRESTMapper().Return(&meta.DefaultRESTMapper{}).AnyTimes()
mockManager.EXPECT().GetFieldIndexer().Return(&informertest.FakeInformers{}).AnyTimes()
}

func TestRunLeader(t *testing.T) {
Expand Down
8 changes: 5 additions & 3 deletions multicluster/cmd/multicluster-controller/member.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ import (
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/webhook"

multiclustercontrollers "antrea.io/antrea/multicluster/controllers/multicluster"
"antrea.io/antrea/multicluster/controllers/multicluster/member"
"antrea.io/antrea/pkg/log"
"antrea.io/antrea/pkg/signals"
Expand Down Expand Up @@ -70,11 +69,13 @@ func runMember(o *Options) error {
role: memberRole},
})

commonAreaCreationCh := make(chan struct{})
clusterSetReconciler := member.NewMemberClusterSetReconciler(mgr.GetClient(),
mgr.GetScheme(),
env.GetPodNamespace(),
o.EnableStretchedNetworkPolicy,
o.ClusterCalimCRDAvailable,
commonAreaCreationCh,
)
if err = clusterSetReconciler.SetupWithManager(mgr); err != nil {
return fmt.Errorf("error creating ClusterSet controller: %v", err)
Expand Down Expand Up @@ -122,15 +123,16 @@ func runMember(o *Options) error {
return fmt.Errorf("error creating Node controller: %v", err)
}

staleController := multiclustercontrollers.NewStaleResCleanupController(
staleController := member.NewStaleResCleanupController(
mgr.GetClient(),
mgr.GetScheme(),
commonAreaCreationCh,
env.GetPodNamespace(),
commonAreaGetter,
multiclustercontrollers.MemberCluster,
)

go staleController.Run(stopCh)

// Member runs ResourceImportReconciler from RemoteCommonArea only

klog.InfoS("Member MC Controller Starting Manager")
Expand Down
4 changes: 2 additions & 2 deletions multicluster/cmd/multicluster-controller/options.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,9 +96,9 @@ func (o *Options) complete(args []string) error {
o.EndpointIPType = ctrlConfig.EndpointIPType
}
o.EnableStretchedNetworkPolicy = ctrlConfig.EnableStretchedNetworkPolicy
klog.InfoS("Using config from file", "config", o.options)
klog.InfoS("Using config from file", "config", o.configFile)
} else {
klog.InfoS("Using default config", "config", o.options)
klog.InfoS("Using default config")
}
return nil
}
Expand Down
11 changes: 11 additions & 0 deletions multicluster/controllers/multicluster/common/helper.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,23 @@ package common
import (
"crypto/sha1" // #nosec G505: not used for security purposes
"encoding/hex"
"time"

corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/util/wait"
)

const labelIdentityHashLength = 16

// CleanUpRetry is the retry when the clean up method
// failed to clean up all stale resources.
var CleanUpRetry = wait.Backoff{
Steps: 15,
Duration: 500 * time.Millisecond,
Factor: 2.0,
Jitter: 1,
}

// TODO: Use NamespacedName stringer method instead of this. e.g. nsName.String()
func NamespacedName(namespace, name string) string {
return namespace + "/" + name
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,10 +72,12 @@ func (r *LeaderClusterSetReconciler) Reconcile(ctx context.Context, req ctrl.Req
return ctrl.Result{}, err
}
klog.InfoS("Received ClusterSet delete", "clusterset", req.NamespacedName)
if r.clusterSetConfig != nil && r.clusterSetConfig.Name != req.Name {
return ctrl.Result{}, nil
}
r.clusterSetConfig = nil
r.clusterID = common.InvalidClusterID
r.clusterSetID = common.InvalidClusterSetID

return ctrl.Result{}, nil
}

Expand Down Expand Up @@ -120,7 +122,7 @@ func (r *LeaderClusterSetReconciler) SetupWithManager(mgr ctrl.Manager) error {
For(&mcv1alpha2.ClusterSet{}).
WithEventFilter(instance).
WithOptions(controller.Options{
MaxConcurrentReconciles: common.DefaultWorkerCount,
MaxConcurrentReconciles: 1,
}).
Complete(r)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ import (
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/client/fake"

mcv1alpha1 "antrea.io/antrea/multicluster/apis/multicluster/v1alpha1"
mcv1alpha2 "antrea.io/antrea/multicluster/apis/multicluster/v1alpha2"
"antrea.io/antrea/multicluster/controllers/multicluster/common"
)
Expand Down Expand Up @@ -82,6 +83,7 @@ var (

func createMockClients(t *testing.T, objects ...client.Object) (*runtime.Scheme, client.Client, *MockMemberClusterStatusManager) {
scheme := runtime.NewScheme()
mcv1alpha1.AddToScheme(scheme)
mcv1alpha2.AddToScheme(scheme)
fakeRemoteClient := fake.NewClientBuilder().WithScheme(scheme).
WithObjects(objects...).Build()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ package leader
import (
"context"
"fmt"
"strings"
"sync"
"time"

Expand Down Expand Up @@ -79,23 +78,17 @@ func NewMemberClusterAnnounceReconciler(client client.Client, scheme *runtime.Sc
// Reconcile implements cluster status management on the leader cluster
func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
memberAnnounce := &mcv1alpha1.MemberClusterAnnounce{}
memberID := getIDFromName(req.Name)
err := r.Get(ctx, req.NamespacedName, memberAnnounce)
if err != nil {
// If MemberClusterAnnounce is deleted, no further processing is needed, as cleanup
// must have been done when the Finalizer was removed.
return ctrl.Result{}, client.IgnoreNotFound(err)
}

memberID := common.ClusterID(memberAnnounce.ClusterID)
finalizer := fmt.Sprintf("%s/%s", MemberClusterAnnounceFinalizer, memberAnnounce.ClusterID)
if !memberAnnounce.DeletionTimestamp.IsZero() {
r.removeMemberStatus(memberID)
memberAnnounce.Finalizers = common.RemoveStringFromSlice(memberAnnounce.Finalizers, finalizer)
if err := r.Update(context.TODO(), memberAnnounce); err != nil {
klog.ErrorS(err, "Failed to update MemberClusterAnnounce", "MemberClusterAnnounce", klog.KObj(memberAnnounce))
return ctrl.Result{}, err
}

return ctrl.Result{}, nil
}

Expand Down Expand Up @@ -217,10 +210,6 @@ func (r *MemberClusterAnnounceReconciler) removeMemberStatus(memberID common.Clu
klog.InfoS("Removed member cluster", "cluster", memberID)
}

func getIDFromName(name string) common.ClusterID {
return common.ClusterID(strings.TrimPrefix(name, "member-announce-from-"))
}

/******************************* MemberClusterStatusManager methods *******************************/

func (r *MemberClusterAnnounceReconciler) GetMemberClusterStatuses() []mcv1alpha2.ClusterStatus {
Expand Down
Loading

0 comments on commit bc26130

Please sign in to comment.