Skip to content

Commit

Permalink
added docs for testing (#1373)
Browse files Browse the repository at this point in the history
* added docs for testing

* added load testing docs

* Update docs/pages/load-testing/results.mdx

Co-authored-by: david-loft <[email protected]>

* Update docs/pages/load-testing/results.mdx

Co-authored-by: david-loft <[email protected]>

* Update docs/pages/load-testing/results.mdx

Co-authored-by: david-loft <[email protected]>

* Update docs/pages/load-testing/results.mdx

Co-authored-by: david-loft <[email protected]>

* Update docs/pages/load-testing/results.mdx

Co-authored-by: david-loft <[email protected]>

* Update docs/pages/load-testing/setup.mdx

Co-authored-by: david-loft <[email protected]>

* Update docs/pages/load-testing/results.mdx

Co-authored-by: david-loft <[email protected]>

* Update docs/pages/load-testing/results.mdx

Co-authored-by: david-loft <[email protected]>

* Update docs/pages/load-testing/results.mdx

Co-authored-by: david-loft <[email protected]>

* Update docs/pages/load-testing/results.mdx

Co-authored-by: david-loft <[email protected]>

* Update docs/pages/load-testing/results.mdx

Co-authored-by: david-loft <[email protected]>

* Update docs/pages/load-testing/results.mdx

Co-authored-by: david-loft <[email protected]>

* Update docs/pages/load-testing/results.mdx

Co-authored-by: david-loft <[email protected]>

* Update docs/pages/load-testing/setup.mdx

Co-authored-by: david-loft <[email protected]>

* Update docs/pages/load-testing/setup.mdx

Co-authored-by: david-loft <[email protected]>

---------

Co-authored-by: facchettos <[email protected]>
Co-authored-by: david-loft <[email protected]>
  • Loading branch information
3 people authored Nov 29, 2023
1 parent 342a0c1 commit 5ea3f01
Show file tree
Hide file tree
Showing 40 changed files with 757 additions and 0 deletions.
88 changes: 88 additions & 0 deletions docs/pages/load-testing/results.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
title: Load Tests
sidebar_label: Load Tests
---

## Summary
This document includes performance test results of Kubernetes API using various vCluster and K8s distributions and configurations.
This is a TL;DR of the test results, the detailed results can be found below. During our tests, K3s with SQLite lagged behind other distributions when running high intensity loads. However, for less intensive usage and a more simple deployment, it was only marginally slower than the others while staying well within the usable range.
If you plan on having high api usage in your vClusters, we recommend using an etcd backed distribution as you will most likely experience timeouts or throttling with the sqlite backed distribution. For less intense usage, K3s with SQLite will be as adequate as the others.


## API Response Times

<figure>
<img src="/docs/media/diagrams/apiserver-latency-baseline.svg" alt="apiserver-avg-baseline" />
<figcaption>APIserver average response time (baseline)</figcaption>
</figure>

During our baseline testing (300 secrets, 30qps), K3s with SQLite was significantly slower than the other distributions, with an average of 0.17s while the other distributions were all around 0.05s. This however should not have an impact since 0.17 is still a relatively good average.


<figure>
<img src="/docs/media/diagrams/apiserver-latency-intensive.svg" alt="apiserver-avg-intensive" />
<figcaption>APIserver average response time (intensive)</figcaption>
</figure>

For our more intensive testing (5000 secrets, 200qps), the differences between the distributions are more pronounced, where K3s with SQLite trailed behind with a 1.4s average response time while etcd K3s (vCluster.Pro distro) had an average response time of around 0.35s for both single node and HA setups. k0s and K8s were the fastest in these tests with an average of around 0.15s. Below is also the cumulative distribution of request times.


<figure>
<img src="/docs/media/diagrams/cumu-distribution-apiserver.svg" alt="apiserver-cumu-dist-intensive" />
<figcaption>Cumulative distribution of request time during the intensive testing</figcaption>
</figure>

## CPU usage

During our testing, most distributions had similar CPU usage, with the exception of k3s with SQLite which had a higher CPU usage, most likely due to having to convert etcd requests into SQLite ones.

<figure>
<img src="/docs/media/diagrams/cpu-sn-baseline.svg" alt="cpu usage (baseline)" />
<figcaption>CPU usage during the baseline test</figcaption>
</figure>

<figure>
<img src="/docs/media/diagrams/cpu-sn-intensive.svg" alt="cpu usage (intensive)" />
<figcaption>CPU usage during the intensive test</figcaption>
</figure>

<figure>
<img src="/docs/media/diagrams/cpu-intensive-ha.svg" alt="cpu usage (intensive) for ha setups" />
<figcaption>CPU usage during the intensive test (ha setups)</figcaption>
</figure>

## Memory usage

Memory usage was relatively similar in all setups

<figure>
<img src="/docs/media/diagrams/mem-usage-baseline.svg" alt="memory usage over time sn setup" />
<figcaption>Memory usage during the baseline test</figcaption>
</figure>

<figure>
<img src="/docs/media/diagrams/mem-usage-intensive.svg" alt="memory usage over time sn setup" />
<figcaption>Memory usage during the intensive test</figcaption>
</figure>

<figure>
<img src="/docs/media/diagrams/mem-usage-ha.svg" alt="memory usage over time sn setup" />
<figcaption>Memory usage during the intensive test with HA setups</figcaption>
</figure>

## Filesystem use

The filesystem usage was higher in the k3s SQLite version compared to all etcd backed versions in the intensive setup. In the baseline setup there was little to no usage of the filesystem

<figure>
<img src="/docs/media/diagrams/fs-write-intensive.svg" alt="fs usage over time" />
<figcaption>Filesystem writes over time</figcaption>
</figure>
<figure>
<img src="/docs/media/diagrams/fs-read-intensive.svg" alt="memory usage over time sn setup" />
<figcaption>Filesystem reads over time</figcaption>
</figure>

## Pod latency

kube-burner calculates some statistics on pods, however it uses the status of the pods which only has a precision of seconds. With this level of precision, all distributions had similar p50, p99, average and max values for containerReady, Initialized, podScheduled and Ready.
12 changes: 12 additions & 0 deletions docs/pages/load-testing/setup.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
title: Test setup
sidebar_label: Setup
---

Our testing has been done through kube-burner, with an EKS cluster as the host cluster, in the eu-west-3 region. All the configuration files are located [here](https://github.com/loft-sh/vcluster/load-test). You will need to change the default storage class from gp2 to gp3.

To monitor the metrics, you should install the [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) operator, and give it the permission to list the pods, services, endpoints and serviceMonitors by modifying the `prometheus-k8s`clusterrole in the namespace you will deploy your vClusters in (or in all namespaces for a faster edit).

The APIs should be exposed (using the `--expose` vCluster option). You can either create the service monitor manually or use the Helm values to have vCluster create it for you. Make sure that Prometheus has done at least one scrape to your vCluster API before running kube-burner, as it would otherwise result in missing data for some metrics.

To run the tests, run `kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090` to have the host cluster's Prometheus forwarded to your local machine, then `vcluster create --expose -f yourConfig yourCluster` to start your vCluster. Once everything is ready and Prometheus has detected your API servers, you will be able to run `kube-burner init --metrics metrics.yaml -c config.yaml -u http://localhost:9090`
1 change: 1 addition & 0 deletions docs/static/media/apiserver-latency-baseline.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/static/media/apiserver-latency-intensive.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/static/media/baseline.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/static/media/cpu-intensive-ha.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/static/media/cpu-sn-baseline.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/static/media/cpu-sn-intensive.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/static/media/cumu-distribution-apiserver.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/static/media/fs-read-intensive.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/static/media/fs-write-intensive.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/static/media/intensive.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/static/media/mem-usage-baseline.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/static/media/mem-usage-intensive.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/static/media/mum-usage-ha.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/static/media/network-in-intensive.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/static/media/network-out-intensive.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 23 additions & 0 deletions load-test/clusterconfig.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: simple-cluster
region: eu-west-3

nodeGroups:
- name: ng-1
instanceType: m5.large
desiredCapacity: 6
iam:
withAddonPolicies:
ebs: true
iam:
withOIDC: true

addons:
- name: vpc-cni
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- name: aws-ebs-csi-driver
wellKnownPolicies: # add IAM and service account
ebsCSIController: true
13 changes: 13 additions & 0 deletions load-test/gp3.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "true"
creationTimestamp: "2023-11-15T09:35:19Z"
name: gp3
parameters:
fsType: ext4
type: gp3
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
23 changes: 23 additions & 0 deletions load-test/ha-k8s.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@

# Enable HA mode
enableHA: true

# Scale up syncer replicas
syncer:
replicas: 3

# Scale up etcd
etcd:
replicas: 3

# Scale up controller manager
controller:
replicas: 3

# Scale up api server
api:
replicas: 3

# Scale up DNS server
coredns:
replicas: 3
Loading

0 comments on commit 5ea3f01

Please sign in to comment.