Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load testing scorer BE #1646

Open
nutrina opened this issue Aug 28, 2023 · 3 comments
Open

Load testing scorer BE #1646

nutrina opened this issue Aug 28, 2023 · 3 comments
Assignees

Comments

@nutrina
Copy link
Collaborator

nutrina commented Aug 28, 2023

The goal of these load tests is to investigate measures to:

  • reduce the error rate that we have had when load testing in the past
  • reduce the response times when experiencing high load
  • ensure a smoother scaling of the infrastructure

Approach:

  • timeout in LB and container shall be in sync - as it stands now, the timeout in LB is 60s while the timeout in the container gunicorn process is 30s. Ideally we should probably have 60s in the LB and 60s (or 65s) in the container. If gunicorn times out first, this would result in the LB connection to be closed, hence the LB some of the 5XX errors
  • check out the following settings in uvicorn ( https://www.uvicorn.org/settings/ ) :
    • --backlog - should be fine to increase this, we have plenty of memory
    • --workers - not sure if we should keep this to 1 or increase it, after all we use gunicorn as the process manager. Increasing this probably makes sense if we drop gunicorn.
    • --limit-concurrency
  • can we remove gunicorn and expose uvicorn to the LB directly?

** Timebox to max of 2 days **

@nutrina nutrina added this to Passport Aug 28, 2023
@nutrina nutrina converted this from a draft issue Aug 28, 2023
@nutrina nutrina moved this from Backlog to In Progress (WIP) in Passport Sep 5, 2023
@Jkd-eth Jkd-eth moved this from In Progress (WIP) to Backlog in Passport Sep 5, 2023
@nutrina nutrina moved this from Backlog to In Progress (WIP) in Passport Sep 12, 2023
@nutrina nutrina self-assigned this Sep 12, 2023
@nutrina nutrina moved this from In Progress (WIP) to Blocked in Passport Sep 15, 2023
@nutrina
Copy link
Collaborator Author

nutrina commented Sep 15, 2023

Putting this to blocked.
Enabling faster scaling & higher throughputs seems to be extremely difficult (we might need to change the deployment type from Fargate to Ec2).
On the other hand a better solution might be to look into AWS lambda ...

This is a topic for the next engineering hangout.

@nutrina
Copy link
Collaborator Author

nutrina commented Sep 19, 2023

The API that I have focused most was the submit-passport request. This seems to be our slowest request. the best throughput I get is of ~0.9 req / s / vCPU.
Not sure yet what the bottleneck is. Is it the DB or didkit, or something else?
However, given this slow request rate, it is very difficult to have autoscaling kick in faster and avoid a degradation of our user experience.

@nutrina
Copy link
Collaborator Author

nutrina commented Sep 22, 2023

This is done, have load tested AWS lambda in: passportxyz/passport-scorer#395

@nutrina nutrina moved this from Blocked to Done in Passport Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant