Skip to content

Commit

Permalink
Additional load testing recommendations
Browse files Browse the repository at this point in the history
  • Loading branch information
sanchariGr committed Dec 11, 2023
1 parent 7cfc7d0 commit 865902f
Showing 1 changed file with 23 additions and 0 deletions.
23 changes: 23 additions & 0 deletions docs/docs/monitoring/load-testing-guidelines.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,29 @@ In our tests we used the Rasa [HTTP-API](https://rasa.com/docs/rasa/pages/http-a
| Up to 50,000 | 6vCPU | 16 GB |
| Up to 80,000 | 6vCPU, with almost 90% CPU usage | 16 GB |

:::info This is the most optimal AWS setup tested on EKS with

ec2: c5.2xlarge - 9.2rps/node throughput
ec2: c5.4xlarge - 19.5rps/node throughput
You can always choose a bigger compute efficient instance like c5.4xlarge with more CPU per node to maximize throughput per node

:::

| AWS | RasaPro | Rasa Action Server |
|--------------------------|----------------------------------------------|-------------------------------------------|
| EC2: C52xlarge | 3vCPU, 10Gb Memory, 3 Sanic Threads | 3vCPU, 2Gb Memory, 3 Sanic Threads |
| EC2: C54xlarge | 7vCPU, 16Gb Memory, 7 Sanic Threads | 7vCPU, 12Gb Memory, 7 Sanic Threads |

### Some recommendations to improve latency
- Running action as a sidecar, saves about ~100ms on average trips from the action server on the concluded tests. Results may vary depending on the number of calls made to the action server.
- Sanic Workers must be mapped 1:1 to CPU for both Rasa Pro and Rasa Action Server
- Create `async` actions to avoid any blocking I/O
- Use KEDA for pre-emptive autoscaling of rasa pods in production based on http requests

Check warning on line 37 in docs/docs/monitoring/load-testing-guidelines.mdx

View workflow job for this annotation

GitHub Actions / Typo CI

pre-emptive

"pre-emptive" is a typo. Did you mean "pre-emotive"?
- `enable_selective_domain: true` : Domain is only sent for actions that needs it. This massively trims the payload between the two pods.
- Consider using c5n.nxlarge machines which are more compute optimized and support better parallelization on http requests.

Check warning on line 39 in docs/docs/monitoring/load-testing-guidelines.mdx

View workflow job for this annotation

GitHub Actions / Typo CI

nxlarge

"nxlarge" is a typo. Did you mean "enlarge"?
However, as they are low on memory, models need to be trained lightweight.
Not suitable if you want to run transformers


### Debugging bot related issues while scaling up

Expand Down

0 comments on commit 865902f

Please sign in to comment.