Skip to content

Commit

Permalink
sample: multidimensional autoscaling for local-drone-control-java
Browse files Browse the repository at this point in the history
  • Loading branch information
pvlugter committed Oct 10, 2023
1 parent 6f13d3c commit c9a426c
Show file tree
Hide file tree
Showing 39 changed files with 1,226 additions and 4 deletions.
135 changes: 135 additions & 0 deletions samples/grpc/local-drone-control-java/autoscaling/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Autoscaling example

This example demonstrates multidimensional autoscaling, to scale the Local Drone Control service to
and from "near zero" — scaling down to a state of minimal resource usage when idle, scaling up and
out when load is increased.

The example uses GraalVM Native Image builds for low resource usage, combines the Kubernetes
vertical and horizontal pod autoscalers, and runs in a k3s cluster (lightweight Kubernetes).


## Requirements

The following tools are required to run this example locally:

- [docker](https://www.docker.com) - Docker engine for building and running containers
- [kubectl](https://kubernetes.io/docs/reference/kubectl) - Kubernetes command line tool
- [k3d](https://k3d.io) - k3s (lightweight Kubernetes) in Docker
- [helm](https://helm.sh) - package manager for Kubernetes


## Build local-drone-control Docker image

First build a Docker image for the Local Drone Control service, as a native image and configured to
run as a multi-node Akka Cluster with PostgreSQL. From the `local-drone-control-java` directory:

```
docker build -f native-image/Dockerfile --build-arg profile=clustered -t local-drone-control .
```

See the native-image build for more information.


## Run the Central Drone Control service

Run the Central Drone Control service. By default, the example assumes this is running locally, but
it can also be deployed.

To run locally, from the `restaurant-drone-deliveries-service-java` directory:

```
docker compose up --wait
docker exec -i postgres_db psql -U postgres -t < ddl-scripts/create_tables.sql
mvn compile exec:exec -DAPP_CONFIG=local1.conf
```

Or see the documentation for deploying to Kubernetes in a cloud environment.


## Start the Local Drone Control service in k3s

A convenience script starts a k3d cluster (k3s cluster in Docker), installs the infrastructure
dependencies for persistence, monitoring, and autoscaling, and then installs the Local Drone
Control service configured for multidimensional autoscaling.

To start the Local Drone Control service in a local k3s cluster, run the `up.sh` script:

```
autoscaling/local/up.sh
```

If the Central Drone Control service has been deployed somewhere other than locally on
`localhost:8101`, the connection details can be specified using arguments to the script:

```
autoscaling/local/up.sh --central-host deployed.app --central-port 443 --central-tls true
```


## Autoscaling infrastructure

This example uses multidimensional autoscaling, combining the Kubernetes vertical and horizontal
pod autoscalers, so that when the service is idle it is both _scaled down_ with minimal resource
requests, and _scaled in_ to a minimal number of pods. The same metrics should not be used for both
the vertical and horizontal autoscalers, so the horizontal pod autoscaler is configured to use a
custom metric — the number of active drones. When activity for the service increases, the vertical
pod autoscaler (VPA) will increase the resource requests, and when the number of active drones
increases, the horizontal pod autoscaler (HPA) will increase the number of pods in the deployment.

The default vertical pod autoscaler recommends new resource requests and limits over long time
frames. In this example, a custom VPA recommender has been configured for short cycles and metric
history, to scale up quickly. The horizontal scaling has been configured for minimum 2 replicas, to
ensure availability of the service (when pods are recreated on vertical scaling), and a pod
disruption budget has been configured to ensure that no more than one pod is unavailable at a time.

You can see the current state and recommendations for the autoscalers by running:

```
kubectl get hpa,vpa
```


## Simulate drone activity

A simple load simulator is available, to demonstrate autoscaling behavior given increasing load.

This simulator moves drones on random delivery paths, frequently reporting updated locations.

In the `autoscaling/simulator` directory, run the Gatling load test:

```
mvn gatling:test
```

You can see the current resource usage for pods by running:

```
kubectl top pods
```

And the current state of the autoscalers and deployed pods with:

```
kubectl get hpa,vpa,deployments,pods
```

The vertical pod autoscaler will increase the resource requests for pods as needed. The current CPU
requests for pods can be seen by running:

```
kubectl get pods -o custom-columns='NAME:metadata.name,CPU:spec.containers[].resources.requests.cpu'
```

When the simulated load has finished, and idle entities have been passivated, the autoscalers will
eventually scale the service back down.


## Stop the Local Drone Control service

To stop and delete the Local Drone Control service and k3s cluster, run the `down.sh` script:

```
autoscaling/local/down.sh
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: local-drone-control
labels:
app: local-drone-control
spec:
replicas: 2
selector:
matchLabels:
app: local-drone-control
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
app: local-drone-control
spec:
serviceAccountName: local-drone-control
containers:
- name: local-drone-control
image: local-drone-control:latest
imagePullPolicy: Never
resources:
requests:
cpu: 100m
memory: 256Mi
livenessProbe:
httpGet:
path: /alive
port: management
readinessProbe:
httpGet:
path: /ready
port: management
args:
- "-Dconfig.resource=application-cluster.conf"
env:
- name: LOCATION_ID
# one of the location ids supported by the restaurant-drone-deliveries service
value: "sweden/stockholm/kungsholmen"
- name: GRPC_PORT
value: "8080"
- name: REMOTE_PORT
value: "2552"
- name: HTTP_MGMT_PORT
value: "8558"
- name: PROMETHEUS_PORT
value: "9090"
- name: REQUIRED_CONTACT_POINT_NR
value: "1"
- name: CENTRAL_DRONE_CONTROL_HOST
valueFrom:
secretKeyRef:
name: central-drone-control
key: host
- name: CENTRAL_DRONE_CONTROL_PORT
valueFrom:
secretKeyRef:
name: central-drone-control
key: port
- name: CENTRAL_DRONE_CONTROL_TLS
valueFrom:
secretKeyRef:
name: central-drone-control
key: tls
- name: DB_HOST
valueFrom:
secretKeyRef:
name: database-credentials
key: host
- name: DB_PORT
valueFrom:
secretKeyRef:
name: database-credentials
key: port
- name: DB_DATABASE
valueFrom:
secretKeyRef:
name: database-credentials
key: database
- name: DB_USER
valueFrom:
secretKeyRef:
name: database-credentials
key: user
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: database-credentials
key: password
ports:
- name: grpc
containerPort: 8080
protocol: TCP
- name: remote
containerPort: 2552
protocol: TCP
- name: management
containerPort: 8558
protocol: TCP
- name: metrics
containerPort: 9090
protocol: TCP
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: local-drone-control
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: local-drone-control
minReplicas: 2
maxReplicas: 5
metrics:
- type: Pods
pods:
metric:
name: local_drone_control_active_entities
target:
type: Value
averageValue: 100
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: local-drone-control
spec:
maxUnavailable: 1
selector:
matchLabels:
app: local-drone-control
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: pod-reader
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["pods"]
verbs: ["get", "watch", "list"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: read-pods
subjects:
- kind: ServiceAccount
name: local-drone-control
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: v1
kind: Service
metadata:
name: local-drone-control
labels:
app: local-drone-control
spec:
type: ClusterIP
ports:
- port: 8080
targetPort: 8080
selector:
app: local-drone-control
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: local-drone-control
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: local-drone-control
labels:
release: local
spec:
endpoints:
- interval: 10s
targetPort: metrics
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app: local-drone-control
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: local-drone-control
spec:
recommenders:
- name: custom
targetRef:
apiVersion: apps/v1
kind: Deployment
name: local-drone-control
updatePolicy:
updateMode: "Auto"
minReplicas: 2
resourcePolicy:
containerPolicies:
- containerName: local-drone-control
mode: "Auto"
minAllowed:
cpu: 100m
memory: 256Mi
maxAllowed:
cpu: 1000m
memory: 1024Mi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
charts/*.tgz
Chart.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
apiVersion: v2
name: autoscaler
description: Vertical pod autoscaler for drones in local k3s
version: 0.1.0
dependencies:
- name: vertical-pod-autoscaler
version: "~9.3.0"
repository: "https://cowboysysop.github.io/charts"
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
vertical-pod-autoscaler:
recommender:
extraArgs:
recommender-name: custom
recommender-interval: 10s
cpu-histogram-decay-half-life: 30s
storage: prometheus
prometheus-address: "http://local-monitoring-prometheus.monitoring:9090"
v: 4
updater:
extraArgs:
updater-interval: 10s
v: 4
Loading

0 comments on commit c9a426c

Please sign in to comment.