Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CPU stressor tool #8

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM golang:1.19-alpine AS build
FROM golang:1.19.2-alpine AS build

WORKDIR /app

Expand All @@ -10,4 +10,8 @@ FROM alpine:latest

COPY --from=build /app/cpu-stress /usr/local/bin/cpu-stress

LABEL maintainer="narmidm"
LABEL version="1.0.0"
LABEL description="A tool to simulate CPU stress on Kubernetes pods."

ENTRYPOINT ["cpu-stress"]
151 changes: 151 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,157 @@ spec:

This manifest runs the `k8s-pod-cpu-stressor` as a Kubernetes Job, which will execute the stress test once for 5 minutes and then stop. The `backoffLimit` specifies the number of retries if the job fails.

## Detailed Usage Examples

Here are some detailed usage examples to help you better understand how to use the `k8s-pod-cpu-stressor`:

### Example 1: Run CPU stress for 30 seconds with 50% CPU usage

```shell
docker run --rm k8s-pod-cpu-stressor -cpu=0.5 -duration=30s
```

### Example 2: Run CPU stress indefinitely with 80% CPU usage

```shell
docker run --rm k8s-pod-cpu-stressor -cpu=0.8 -forever
```

### Example 3: Run CPU stress for 1 minute with 10% CPU usage

```shell
docker run --rm k8s-pod-cpu-stressor -cpu=0.1 -duration=1m
```

## Step-by-Step Guide for Building and Running the Docker Image

Follow these steps to build and run the Docker image for `k8s-pod-cpu-stressor`:

1. Clone the repository:

```shell
git clone https://github.com/narmidm/k8s-pod-cpu-stressor.git
cd k8s-pod-cpu-stressor
```

2. Build the Docker image:

```shell
docker build -t k8s-pod-cpu-stressor .
```

3. Run the Docker container with desired parameters:

```shell
docker run --rm k8s-pod-cpu-stressor -cpu=0.2 -duration=10s
```

## Using the Tool in a Kubernetes Environment

To use the `k8s-pod-cpu-stressor` in a Kubernetes environment, you can create a deployment or a job using the provided sample manifests.

### Sample Deployment Manifest

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: cpu-stressor-deployment
spec:
replicas: 1
selector:
matchLabels:
app: cpu-stressor
template:
metadata:
labels:
app: cpu-stressor
spec:
containers:
- name: cpu-stressor
image: narmidm/k8s-pod-cpu-stressor:latest
args:
- "-cpu=0.2"
- "-duration=10s"
- "-forever"
resources:
limits:
cpu: "200m"
requests:
cpu: "100m"
```

### Sample Job Manifest

```yaml
apiVersion: batch/v1
kind: Job
metadata:
name: cpu-stressor-job
spec:
template:
metadata:
labels:
app: cpu-stressor
spec:
containers:
- name: cpu-stressor
image: narmidm/k8s-pod-cpu-stressor:latest
args:
- "-cpu=0.5"
- "-duration=5m"
resources:
limits:
cpu: "500m"
requests:
cpu: "250m"
restartPolicy: Never
backoffLimit: 3
```

## Troubleshooting and Common Issues

### Issue 1: High CPU Usage

If you experience unexpectedly high CPU usage, ensure that the `-cpu` parameter is set correctly. For example, `-cpu=0.2` represents 20% CPU usage.

### Issue 2: Container Fails to Start

If the container fails to start, check the Docker logs for error messages. Ensure that the `-duration` parameter is set to a valid duration value.

### Issue 3: Kubernetes Pod Restarting

If the Kubernetes pod keeps restarting, ensure that the resource requests and limits are set appropriately in the manifest. Adjust the values based on your cluster's capacity.

## Advanced Usage Scenarios

### Scenario 1: Using Horizontal Pod Autoscaler (HPA)

To automatically scale the number of pod replicas based on CPU usage, you can use a Horizontal Pod Autoscaler (HPA). Here is an example HPA manifest:

```yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: cpu-stressor-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: cpu-stressor-deployment
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 80
```

### Scenario 2: Integrating with CI/CD Pipelines

You can integrate the `k8s-pod-cpu-stressor` with your CI/CD pipelines for automated testing and monitoring. For example, you can use GitHub Actions to build and push the Docker image, and then deploy it to your Kubernetes cluster for stress testing.

### Scenario 3: Monitoring with Prometheus and Grafana

To monitor the resource usage of the `k8s-pod-cpu-stressor`, you can use Prometheus and Grafana. Set up Prometheus to scrape metrics from your Kubernetes cluster, and use Grafana to visualize the metrics. This helps identify bottlenecks and optimize resource allocation.

## Contributing

Contributions are welcome! If you find a bug or have a suggestion, please open an issue or submit a pull request. For major changes, please discuss them first in the issue tracker.
Expand Down
12 changes: 12 additions & 0 deletions hpa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: cpu-stressor-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: cpu-stressor-deployment
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 80
25 changes: 20 additions & 5 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,27 @@ import (
"runtime"
"sync/atomic"
"time"

"github.com/sirupsen/logrus"
)

var log = logrus.New()

func main() {
cpuUsagePtr := flag.Float64("cpu", 0.2, "CPU usage as a fraction (e.g., 0.2 for 20% CPU usage)")
durationPtr := flag.Duration("duration", 10*time.Second, "Duration for the CPU stress (e.g., 10s)")
runForeverPtr := flag.Bool("forever", false, "Run CPU stress indefinitely")
flag.Parse()

// Validate input parameters
if *cpuUsagePtr <= 0 || *cpuUsagePtr > 1 {
log.Fatalf("Invalid CPU usage: %f. It must be between 0 and 1.", *cpuUsagePtr)
}

if *durationPtr <= 0 {
log.Fatalf("Invalid duration: %s. It must be greater than 0.", *durationPtr)
}

numCPU := runtime.NumCPU()
runtime.GOMAXPROCS(numCPU)

Expand All @@ -26,13 +39,15 @@ func main() {
numGoroutines = 1
}

fmt.Printf("Starting CPU stress with %d goroutines targeting %.2f CPU usage...\n", numGoroutines, *cpuUsagePtr)
log.Infof("Starting CPU stress with %d goroutines targeting %.2f CPU usage...", numGoroutines, *cpuUsagePtr)

done := make(chan struct{})

// Capture termination signals
quit := make(chan os.Signal, 1)
signal.Notify(quit, os.Interrupt, os.Kill)
if err := signal.Notify(quit, os.Interrupt, os.Kill); err != nil {
log.Fatalf("Failed to set up signal notification: %v", err)
}

var stopFlag int32

Expand Down Expand Up @@ -63,21 +78,21 @@ func main() {
go func() {
// Wait for termination signal
<-quit
fmt.Println("\nTermination signal received. Stopping CPU stress...")
log.Println("\nTermination signal received. Stopping CPU stress...")
atomic.StoreInt32(&stopFlag, 1)
close(done)
}()

if !*runForeverPtr {
time.Sleep(*durationPtr)
fmt.Println("\nCPU stress completed.")
log.Println("\nCPU stress completed.")
atomic.StoreInt32(&stopFlag, 1)
close(done)
// Keep the process running to prevent the pod from restarting
select {}
}

// Run stress indefinitely
fmt.Println("CPU stress will run indefinitely. Press Ctrl+C to stop.")
log.Println("CPU stress will run indefinitely. Press Ctrl+C to stop.")
<-done
}
78 changes: 78 additions & 0 deletions monitoring-tools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Monitoring Tools for Kubernetes

To effectively monitor and optimize the resource usage of your Kubernetes cluster, you can use monitoring tools like Prometheus and Grafana. These tools help collect and visualize resource usage metrics, allowing you to identify bottlenecks and make informed decisions about resource allocation.

## Prometheus

Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It collects metrics from various sources and stores them in a time-series database. Prometheus can be used to monitor the resource usage of your Kubernetes cluster, including CPU and memory usage.

### Installing Prometheus

To install Prometheus in your Kubernetes cluster, you can use the Prometheus Operator, which simplifies the deployment and management of Prometheus instances. Follow these steps to install Prometheus using the Prometheus Operator:

1. Add the Prometheus Operator Helm repository:

```shell
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
```

2. Install the Prometheus Operator:

```shell
helm install prometheus-operator prometheus-community/kube-prometheus-stack
```

3. Verify the installation:

```shell
kubectl get pods -n default -l "release=prometheus-operator"
```

### Configuring Prometheus

Once Prometheus is installed, you need to configure it to scrape metrics from your Kubernetes cluster. The Prometheus Operator automatically configures Prometheus to scrape metrics from various Kubernetes components, including the kubelet, API server, and cAdvisor.

To customize the Prometheus configuration, you can edit the `values.yaml` file used during the Helm installation. For example, you can add custom scrape configurations to collect metrics from additional endpoints.

## Grafana

Grafana is an open-source analytics and monitoring platform that integrates with Prometheus to visualize metrics. It provides a rich set of features for creating and sharing dashboards, setting up alerts, and exploring metrics data.

### Installing Grafana

Grafana is included in the Prometheus Operator installation, so you don't need to install it separately. To access the Grafana dashboard, follow these steps:

1. Get the Grafana admin password:

```shell
kubectl get secret prometheus-operator-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
```

2. Forward the Grafana service port to your local machine:

```shell
kubectl port-forward svc/prometheus-operator-grafana 3000:80
```

3. Open your web browser and navigate to `http://localhost:3000`. Log in with the username `admin` and the password obtained in step 1.

### Creating Dashboards

Grafana provides a wide range of pre-built dashboards for Kubernetes monitoring. You can import these dashboards from the Grafana dashboard library or create custom dashboards to visualize the metrics collected by Prometheus.

To import a pre-built dashboard, follow these steps:

1. In the Grafana UI, click on the "+" icon in the left sidebar and select "Import".
2. Enter the dashboard ID or URL from the Grafana dashboard library and click "Load".
3. Select the Prometheus data source and click "Import".

## Analyzing Metrics

With Prometheus and Grafana set up, you can start analyzing the collected metrics to optimize resource allocation in your Kubernetes cluster. Here are some tips for analyzing metrics:

- **Identify Bottlenecks**: Look for high CPU or memory usage in your pods and nodes. Identify the components that are consuming the most resources and investigate the root cause.
- **Adjust Resource Requests and Limits**: Based on the observed resource usage, adjust the resource requests and limits in your Kubernetes manifests to ensure optimal resource allocation.
- **Set Up Alerts**: Use Prometheus alerting rules to set up alerts for critical resource usage thresholds. Configure Grafana to send notifications when alerts are triggered.

By using Prometheus and Grafana, you can gain valuable insights into the resource usage of your Kubernetes cluster and make informed decisions to optimize performance and resource allocation.
8 changes: 8 additions & 0 deletions resource-quotas.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
apiVersion: v1
kind: ResourceQuota
metadata:
name: cpu-stressor-quota
spec:
hard:
requests.cpu: "1"
limits.cpu: "2"
Loading