Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IP ratelimit middleware to JobMgr submit endpoint #31

Closed
wants to merge 8 commits into from

Conversation

bodom0015
Copy link
Member

@bodom0015 bodom0015 commented Nov 17, 2023

Problem

We would like to ratelimit the JobMgr submit endpoint, while allowing unrestricted access to the other endpoints

Approach

  • split /api/v1/job/submit into its own ingress rule
  • apply ip-ratelimit middleware to job submit endpoint (prod + staging)

Screenshot_2023-12-01_at_11_59_39_AM

How to Test

  1. Navigate to https://clean.frontend.staging.mmli1.ncsa.illinois.edu/configuration
  2. Login using the link at the top-right (to avoid hcaptcha)
  3. Paste the following example input and submit a new CLEAN job:
>WP_063460136
MAIPPYPDFRSAAFLRQHLRATMAFYDPVATDASGGQFHFFLDDGTVYNTHTRHLVSATRFVVTHAMLYRTTGEARYQVGMRHALEFLRTAFLDPATGGYAWLIDWQDGRATVQDTTRHCYGMAFVMLAYARAYEAGVPEARVWLAEAFDTAEQHFWQPAAGLYADEASPDWQLTSYRGQNANMHACEAMISAFRATGERRYIERAEQLAQGICQRQAALSDRTHAPAAEGWVWEHFHADWSVDWDYNRHDRSNIFRPWGYQVGHQTEWAKLLLQLDALLPADWHLPCAQRLFDTAVERGWDAEHGGLYYGMAPDGSICDDGKYHWVQAESMAAAAVLAVRTGDARYWQWYDRIWAYCWAHFVDHEHGAWFRILHRDNRNTTREKSNAGKVDYHNMGACYDVLLWALDAPGFSKESRSAALGRP
  1. Click the browser's Back button and attempt to submit the same job again (within ~30s of the first submission)
    • You should see that Traefik rejects this request and returns HTTP 429: Too Many Requests
    • You should see that a response header is returned telling the user when they can retry the request again

Comment on lines +44 to +46
rateLimit:
average: 4
period: 120s
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part we would need to adjust if we want to raise or lower the allowed rate of requests 👍

rate = average / period = 4 / 120s = 1 / 30s
which means we currently allow a single request every 30s

See https://doc.traefik.io/traefik/middlewares/http/ratelimit/#configuration-options for further explanation

Copy link
Member Author

@bodom0015 bodom0015 Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference I think that the last time we had an incident, we had ~1200 jobs over 24 hours - this was too many

Some ideas:
24 hours * 60 minutes * 60 seconds = 86400 seconds - maybe this should be our period?

86400 / 1200 = ~72 requests per second - maybe this should be our absolute maximum for rate? But this is the maximum across all users (not per-user, like our limits)

Would 1 request per 120s be an acceptable baseline for the initial release? We can experiment with a longer period and/or changing the number of requests over time as our users' needs become better defined.

We can also add burst to this as well, if we would like to allow a brief (but not sustained) temporary push beyond our limits

@bodom0015 bodom0015 closed this Feb 1, 2024
@bodom0015
Copy link
Member Author

No need to apply this to JobMgr, as this was only requested by ChemScraper

Will revisit if we see a need in the future 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant