Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TT-10676 Upgrade Resurface Pump backend #731

Closed
wants to merge 1 commit into from

Conversation

monrax
Copy link
Contributor

@monrax monrax commented Oct 9, 2023

Upgrade the Resurface backend for the Tyk Pump. This upgrade makes the data writing process asynchronous through the use of bounded channels. In addition, the Shutdown method has been defined to support graceful shutdown.

Description

  • Upgrade logger-go dependency to version 3.3.1, which includes improvements in goroutine management, as well as a new Stop method for graceful shutdown.
  • Add support for async data writing, by adding a bounded channel to buffer data records and process them concurrently in the background.
  • Add Shutdown method for graceful shutdown of ResurfacePump backend.

Related Issue

Issues #729 and #730

Motivation and Context

  • Version 3.2.1 of the logger-go dependency is susceptible to memory runaway issues under heavy load. Version 3.3.1 provides a fix for this problem, as well as a new Stop method for graceful shutdown of the data capture process.
  • With synchronous record processing, WriteData might take a long time to return (e.g. longer than the configured redis TTL). This can be a bottleneck that can affect the entire data capture process.

How This Has Been Tested

  • Clone Tyk Pro demo repo
  • Add both Resurface and HTTP Bin services to the docker-compose.yml file:
  tyk-resurface:
    image: resurfaceio/resurface:3.5.4
    container_name: tyk-resurface
    ports:
      - "7700:7700"
      - "7701:7701"
    tmpfs:
      - /db
    networks:
      - tyk

  tyk-httpbin:
    image: kennethreitz/httpbin
    container_name: tyk-httpbin
    ports:
      - "80:80"
    networks:
      - tyk
  • Modify the pump.conf file to configure the Resurface pump backend without any timeout:
    "resurfaceio": {
      "type": "resurfaceio",
      "meta": {
        "capture_url": "http://tyk-resurface:7701/message",
        "rules": "include debug"
      }
    }
  • Import the following HTTP Bin Tyk API definition:
{
    "name": "httpbin",
    "slug": "httpbin",
    "api_id": "httpbin-api",
    "org_id": "test-org",
    "use_keyless": true,
    "version_data": {
        "not_versioned": true,
        "versions": {
            "Default": {
                "name": "Default",
                "expires": ""
            }
        }
    },
    "proxy": {
        "listen_path": "/http-bin/",
        "target_url": "http://httpbin"
    },
    "active": true,
    "enable_detailed_recording": true
}
  • Use a load testing application like Locust or bombardier to send a several concurrent requests to the /image/png endpoint aiming for >4000 RPS (Configurations used: Locust: 2 workers, 50 spawn rate, and 6000 users; Bombardier: concurrency level of 100 with 500k requests)
  • Wait for a few minutes. If using the current Tyk Pump release: the number of purged records will start to build up to a point where the pump can't keep up with the new records being generated. The gaps created by this issue will be reflected in Resurface. No errors or gaps are present when performing the same test with the changes introduced by this PR.

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • Make sure you are requesting to pull a topic/feature/bugfix branch (right side). If pulling from your own
    fork, don't request your master!
  • Make sure you are making a pull request against the master branch (left side). Also, you should start
    your branch off our latest master.
  • My change requires a change to the documentation.
    • If you've changed APIs, describe what needs to be updated in the documentation.
  • I have updated the documentation accordingly.
  • Modules and vendor dependencies have been updated; run go mod tidy && go mod vendor
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • Check your code additions will not fail linting checks:
    • go fmt -s
    • go vet

@caroltyk caroltyk changed the title Upgrade Resurface Pump backend TT-10676 Upgrade Resurface Pump backend Nov 29, 2023
@tbuchaillot
Copy link
Contributor

tbuchaillot commented Dec 13, 2023

Hey @monrax, thanks for the PR! we will have a look!

@mativm02 mativm02 self-requested a review December 27, 2023 12:12
@mativm02
Copy link
Contributor

Hey @monrax! Could you please provide us with a Resurface license so we can completely test it?

@monrax
Copy link
Contributor Author

monrax commented Jan 12, 2024

Hey @tbuchaillot ! Apologies I missed your early messages here as well, my notification tray was a mess. Nice to meet you @mativm02 --I just sent the license to Tom, but if it makes things easier I can send it to you as well. Is [email protected] your current mail address?

@monrax
Copy link
Contributor Author

monrax commented Jan 17, 2024

Hi all, I just squashed and rebased all commits onto master, with all tests passing. Please, let me know if there's anything else we can provide from our side to move this ahead. Thank you!

Copy link
Contributor

@mativm02 mativm02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thank you @monrax

rp.log.Info(rp.GetName() + " Initialized")
return nil
}

func (rp *ResurfacePump) initWorker() {
rp.data = make(chan []interface{}, 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the channel size is hardcoded to 5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, it would be 1 instead of 5. After some load testing I found that having a couple extra slots to avoid blocking right away wouldn't hurt (plus some more just to be safe). I do agree it would be better to make it configurable by defining a new parameter for this pump backend. Please, let me know if this is something you need and I can make a new PR with the required changes. Thanks!

@mativm02
Copy link
Contributor

Hey @monrax! I created this PR so we can make some CIs work as expected. We can make any changes (if needed) in the new one. Again, thanks for your contribution, it's really appreciated!

@mativm02 mativm02 closed this Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants