Skip to content

Commit

Permalink
add aro-hpc scability test
Browse files Browse the repository at this point in the history
Signed-off-by: Wei Liu <[email protected]>
  • Loading branch information
skeeey committed Aug 7, 2024
1 parent c2b1c73 commit 16f2e45
Show file tree
Hide file tree
Showing 70 changed files with 16,116 additions and 3 deletions.
11 changes: 11 additions & 0 deletions .gitleaks.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[allowlist]
description = "Global Allowlist"

# Ignore the test manifests
paths = [
# Ignore the aro-hpc test manifest
'''test\/performance\/pkg\/hub\/workloads\/manifests\/aro-hpc\/manifestwork.hypershift.yaml$''',

# Ignore the aro-hpc test result
'''test\/performance\/result\/aro-hpc\/workload\/hypershift.work$''',
]
6 changes: 3 additions & 3 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ require (
github.com/onsi/gomega v1.32.0
github.com/openshift-online/ocm-common v0.0.0-20240620110211-2ecfa6ec5707
github.com/openshift-online/ocm-sdk-go v0.1.421
github.com/openshift/library-go v0.0.0-20240621150525-4bb4238aef81
github.com/prometheus/client_golang v1.18.0
github.com/segmentio/ksuid v1.0.2
github.com/spf13/cobra v1.8.0
Expand All @@ -41,12 +42,14 @@ require (
gorm.io/gorm v1.24.7-0.20230306060331-85eaf9eeda11
k8s.io/api v0.30.2
k8s.io/apimachinery v0.30.2
k8s.io/apiserver v0.30.1
k8s.io/client-go v0.30.2
k8s.io/component-base v0.30.2
k8s.io/klog/v2 v2.120.1
open-cluster-management.io/api v0.14.1-0.20240627145512-bd6f2229b53c
open-cluster-management.io/ocm v0.13.1-0.20240618054845-e2a7b9e78b33
open-cluster-management.io/sdk-go v0.14.1-0.20240717021054-955108a181ee
sigs.k8s.io/yaml v1.4.0
)

require (
Expand Down Expand Up @@ -107,7 +110,6 @@ require (
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/openshift/api v0.0.0-20240527133614-ba11c1587003 // indirect
github.com/openshift/client-go v0.0.0-20240528061634-b054aa794d87 // indirect
github.com/openshift/library-go v0.0.0-20240621150525-4bb4238aef81 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pkg/profile v1.3.0 // indirect
github.com/prometheus/client_model v0.5.0 // indirect
Expand Down Expand Up @@ -153,7 +155,6 @@ require (
gopkg.in/yaml.v3 v3.0.1 // indirect
gorm.io/driver/mysql v1.4.7 // indirect
k8s.io/apiextensions-apiserver v0.30.1 // indirect
k8s.io/apiserver v0.30.1 // indirect
k8s.io/kms v0.30.1 // indirect
k8s.io/kube-aggregator v0.30.1 // indirect
k8s.io/kube-openapi v0.0.0-20240228011516-70dd3763d340 // indirect
Expand All @@ -163,5 +164,4 @@ require (
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
sigs.k8s.io/kube-storage-version-migrator v0.0.6-0.20230721195810-5c8923c5ff96 // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.4.1 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
)
115 changes: 115 additions & 0 deletions test/performance/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Performance Test

## ARO HPC

### Workloads

There are 10 consumers in the maestro server and each consumer has 600 resource bundles, include

- 300 managed cluster resource bundles, one managed cluster resource bundle contains a [ManagedCluster](./pkg/hub/workloads/manifests/aro-hpc/managedcluster.yaml) CR
- 300 manifestworks resource bundles, one managed cluster resource bundle contains two ManifestWork CRs: [namespace](./pkg/hub/workloads/manifests/aro-hpc/manifestwork.namespace.yaml) and [hypershift](./pkg/hub/workloads/manifests/aro-hpc/manifestwork.hypershift.yaml)

And after the resources are applied on the consumer agent part, there is a status simulator to add the mock status for the resources, finally, one resource will have spec and status, the whole sample can be found from [here](./result/aro-hpc/workload/) for managed cluster and manifestworks resources.

#### Workload Size

```
total_records=10x300x2=6000
one_managed_cluster_resource_bundles_record_size=3K (3127, spec=742, status=2067)
one_manifestworks_resource_bundles_record_size=49K (49899, spec=30771, status=18802)
total_size_records_per_consumer=300x3+300x49=15M
total_size_records=15x10=150M
```

### Test Steps

1. Follow [ARO-HCP doc](https://github.com/Azure/ARO-HCP/blob/38b459d9e88898d79780e6aa0eacb841828aab07/dev-infrastructure/docs/development-setup.md#maestro-infrastructure) to deploy maestro in the ARO

2. Add 10 consumers in the maestro

```sh
counts=10 test/performance/hack/aro-hpc/prepare.consumer.sh
```

3. Prepare a KinD cluster to run consumer agents

```sh
test/performance/hack/aro-hpc/prepare.kind.sh
```

4. Start 10 consumer agents

```sh
# tail -f _output/performance/aro/logs/agents.log
counts=10 test/performance/hack/aro-hpc/start-consumer-agents.sh
```

5. Start a watcher to simulate a controller to update the resource status

```sh
# tail -f _output/performance/aro/logs/watcher.log
counts=10 test/performance/hack/aro-hpc/start-spoke-watcher.sh
```

6. Create resource bundles for two consumers: 1 and 2

```sh
index=9 test/performance/hack/aro-hpc/create-works.sh
index=10 test/performance/hack/aro-hpc/create-works.sh
```

7. Wait the resources are updated on spoke, repeat the step 6

### Maestro server cpu/memory consumption

![cpu-avg](./result/aro-hpc/resource-usage/cpu-mem/svc-cpu-avg.png)

![cpu-max](./result/aro-hpc/resource-usage/cpu-mem/svc-cpu-max.png)

![mem-ws-avg](./result/aro-hpc/resource-usage/cpu-mem/svc-mem-ws-avg.png)

![mem-ws-max](./result/aro-hpc/resource-usage/cpu-mem/svc-mem-ws-max.png)

![mem-rss-avg](./result/aro-hpc/resource-usage/cpu-mem/910/svc/mem-avg.png)

![mem-rss-max](./result/aro-hpc/resource-usage/cpu-mem/910/svc/mem-max.png)

### PostgreSQL cpu/memory/storage consumption

![cpu-avg](./result/aro-hpc/resource-usage/cpu-mem/db-cpu-avg.png)

![cpu-max](./result/aro-hpc/resource-usage/cpu-mem/db-cpu-max.png)

![mem-ws-avg](./result/aro-hpc/resource-usage/cpu-mem/db-mem-ws-avg.png)

![mem-ws-max](./result/aro-hpc/resource-usage/cpu-mem/db-mem-ws-max.png)

![mem-rss-avg](./result/aro-hpc/resource-usage/cpu-mem/910/db/mem-avg.png)

![mem-rss-max](./result/aro-hpc/resource-usage/cpu-mem/910/db/mem-max.png)

```
# PostgreSQL Table Size
total | records
-------+---------
15 MB | 1200
27 MB | 2400
40 MB | 3600
52 MB | 4800
65 MB | 6000
```

### Event Grid consumption

![mqtt-connections](./result/aro-hpc/resource-usage/mqtt/conns.png)

![mqtt-throughput](./result/aro-hpc/resource-usage/mqtt/throughput.png)

![mqtt-request-counts](./result/aro-hpc/resource-usage/mqtt/req-counts.png)

### Responsiveness

1. The maestro server resource creation velocity: avg=54r/s, max=93r/s (source client sends 6000 requests with a given QPS(avg=56, max=93) )
2. The maestro server resource status update velocity: avg=2r/s, max=15r/s (10 agents, each agent sync the resource status every 10s)
3. List time consumption (see [here](./result/aro-hpc/time/list_time.txt))
134 changes: 134 additions & 0 deletions test/performance/cmd/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
package main

import (
"context"
goflag "flag"
"fmt"
"os"

"github.com/spf13/cobra"
"github.com/spf13/pflag"

"k8s.io/apiserver/pkg/server"
utilflag "k8s.io/component-base/cli/flag"
"k8s.io/component-base/logs"
"k8s.io/klog/v2"

"github.com/openshift-online/maestro/test/performance/pkg/hub"
"github.com/openshift-online/maestro/test/performance/pkg/spoke"
"github.com/openshift-online/maestro/test/performance/pkg/watcher"
)

func main() {
pflag.CommandLine.SetNormalizeFunc(utilflag.WordSepNormalizeFunc)
pflag.CommandLine.AddGoFlagSet(goflag.CommandLine)

logs.AddFlags(pflag.CommandLine)
logs.InitLogs()
defer logs.FlushLogs()

cmd := &cobra.Command{
Use: "maestroperf",
Short: "Maestro Performance Test Tool",
Run: func(cmd *cobra.Command, args []string) {
_ = cmd.Help()
os.Exit(1)
},
}

cmd.AddCommand(
newAROHPCPreparationCommand(),
newAROHPCSpokeCommand(),
newAROHPCWatchCommand(),
)

if err := cmd.Execute(); err != nil {
fmt.Fprintf(os.Stderr, "%v\n", err)
os.Exit(1)
}
}

func newAROHPCPreparationCommand() *cobra.Command {
o := hub.NewAROHPCPreparerOptions()
cmd := &cobra.Command{
Use: "aro-hpc-prepare",
Short: "Prepare clusters or works in Maestro for ARO HPC",
Long: "Prepare clusters or works in Maestro for ARO HPC",
Run: func(cmd *cobra.Command, args []string) {
// handle SIGTERM and SIGINT by cancelling the context.
ctx, cancel := context.WithCancel(context.Background())
shutdownHandler := server.SetupSignalHandler()
go func() {
defer cancel()
<-shutdownHandler
klog.Infof("\nShutting down aro-hpc-prepare.")
}()

if err := o.Run(ctx); err != nil {
klog.Errorf("failed to run aro-hpc-prepare, %v", err)
}
},
}

flags := cmd.Flags()
o.AddFlags(flags)
return cmd
}

func newAROHPCSpokeCommand() *cobra.Command {
o := spoke.NewAROHPCSpokeOptions()
cmd := &cobra.Command{
Use: "aro-hpc-spoke",
Short: "Start agents for ARO HPC",
Long: "Start agents for ARO HPC",
Run: func(cmd *cobra.Command, args []string) {
// handle SIGTERM and SIGINT by cancelling the context.
ctx, cancel := context.WithCancel(context.Background())
shutdownHandler := server.SetupSignalHandler()
go func() {
defer cancel()
<-shutdownHandler
klog.Infof("\nShutting down aro-hpc-spoke.")
}()

if err := o.Run(ctx); err != nil {
klog.Errorf("failed to run aro-hpc-spoke, %v", err)
}

<-ctx.Done()
},
}

flags := cmd.Flags()
o.AddFlags(flags)
return cmd
}

func newAROHPCWatchCommand() *cobra.Command {
o := watcher.NewAROHPCWatcherOptions()
cmd := &cobra.Command{
Use: "aro-hpc-watch",
Short: "Start watcher for ARO HPC",
Long: "Start watcher for ARO HPC",
Run: func(cmd *cobra.Command, args []string) {
// handle SIGTERM and SIGINT by cancelling the context.
ctx, cancel := context.WithCancel(context.Background())
shutdownHandler := server.SetupSignalHandler()
go func() {
defer cancel()
<-shutdownHandler
klog.Infof("\nShutting down aro-hpc-watch.")
}()

if err := o.Run(ctx); err != nil {
klog.Errorf("failed to run aro-hpc-watch, %v", err)
}

<-ctx.Done()
},
}

flags := cmd.Flags()
o.AddFlags(flags)
return cmd
}
8 changes: 8 additions & 0 deletions test/performance/hack/aro-hpc/check-result.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash

db_pod_name=$(kubectl -n maestro get pods -l name=maestro-db -ojsonpath='{.items[0].metadata.name}')

kubectl -n maestro exec ${db_pod_name} -- psql -d maestro -U maestro -c 'select count(*) from resources'
kubectl -n maestro exec ${db_pod_name} -- psql -d maestro -U maestro -c "select created_at,updated_at,extract(epoch from age(updated_at,created_at)) from resources where consumer_name='maestro-cluster-9' order by created_at"
kubectl -n maestro exec ${db_pod_name} -- psql -d maestro -U maestro -c "select created_at,updated_at,extract(epoch from age(updated_at,created_at)) from resources where consumer_name='maestro-cluster-10' order by created_at"
kubectl -n maestro exec ${db_pod_name} -- psql -d maestro -U maestro -c "select pg_size_pretty(pg_total_relation_size('resources')) as total, pg_size_pretty(pg_relation_size('resources')) as data"
10 changes: 10 additions & 0 deletions test/performance/hack/aro-hpc/cleanup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash

ARO_HCP_REPO_PATH="$HOME/go/src/github.com/Azure/ARO-HCP"

ls _output/performance/aro/pids | xargs kill
kind delete clusters --all

pushd $ARO_HCP_REPO_PATH/dev-infrastructure
AKSCONFIG=svc-cluster make clean
popd
29 changes: 29 additions & 0 deletions test/performance/hack/aro-hpc/create-clusters.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/usr/bin/env bash
total=${total:-10}
begin_index=${begin_index:-1}

lastIndex=$(($begin_index + $total - 1))
echo "create clusters from maestro-cluster-$begin_index to maestro-cluster-$lastIndex"

kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
name: clusters-$begin_index-$lastIndex
namespace: maestro
spec:
template:
spec:
containers:
- name: aro-hpc-clusters
image: quay.io/skeeey/maestro-perf-tool:aro-hpc
imagePullPolicy: IfNotPresent
args:
- "/maestroperf"
- "aro-hpc-prepare"
- "--cluster-begin-index=$begin_index"
- "--cluster-counts=$total"
- "--only-clusters=true"
restartPolicy: Never
backoffLimit: 4
EOF
29 changes: 29 additions & 0 deletions test/performance/hack/aro-hpc/create-works.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/usr/bin/env bash
works=${works:-300}

index=${index:-1}

echo "create works for maestro-cluster-$index"

kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
name: works-$index
namespace: maestro
spec:
template:
spec:
containers:
- name: aro-hpc-clusters
image: quay.io/skeeey/maestro-perf-tool:aro-hpc
imagePullPolicy: IfNotPresent
args:
- "/maestroperf"
- "aro-hpc-prepare"
- "--cluster-begin-index=$index"
- "--cluster-counts=1"
- "--work-counts=$works"
restartPolicy: Never
backoffLimit: 4
EOF
Loading

0 comments on commit 16f2e45

Please sign in to comment.