-
Notifications
You must be signed in to change notification settings - Fork 680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Load and Performance Doc #2140
Conversation
###Warning {.info} | ||
We do not recommend load testing on the Live environment if the site has already launched because you risk overwhelming your live site and causing downtime. | ||
</div> | ||
Note the start time for the test. As the test executes, it's a good idea to keep a close eye on [log files](/docs/logs). Make note of any errors and warnings that pop up during test to fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bentekwork Which log files should I watch? Do they differ based on the type of test being run?
3. Determine how much load to apply for your test. | ||
|
||
* **Performance Tests**: Smaller loads should suffice, as you should be able to see transactional bottlenecks with 10-20 concurrent users. | ||
* **Load Tests**: Determine how many concurrent users the site is expected to serve based on historical analytics for the site. Identify the peak hourly sessions and average session duration, then do some math: `hourly_sessions / (60 / average_duration) = Concurrent Users` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bentekwork How do I determine load to apply in the test after calculating concurrent users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems thorough on first and second reading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great update! This is coming along nicely. Added a few comments. Would be happy to review again.
### Performance Testing | ||
Performance testing is the process in which you measure an application's response time to proactively expose bottlenecks. These tests should be regularly executed as part of routine maintenance. Additionally, you should run these test before any load testing. If your application is not performing well, then you can be assured that the load test will not go well. | ||
|
||
The scope of performance tests should be limited to the application itself on a development environment (Dev or [Multidev](/docs/multidev)) without caching. This will give you an honest look into your application and show exactly how uncached requests will perform. You can bypass cache by [setting the `no-cache` HTTP headers](/docs/cache-control) in responses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offer alternatives to bypass cache by setting a no-cache header? How about just disabling cache completely on Dev/Multidev during testing through Drupal/WordPress Admin UI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the dev environment has a default time-to-live of zero for dev, which implies no caching, but that things like Pantheon Advanced Page Cache may override this to be non-zero value. While a no-cache header may help, this may depend on when this get executed. Suggesting to disable caching via the UI is an option, with an emphasis to remember to re-enable prior to pushing to prod.
### Load Testing | ||
Load testing is the process in which you apply requests to your site that will represent the most load that your site will face once it is live. This test will ensure that the site can withstand the peak traffic spikes after launch. This test should be done on the Live environment before the site has launched, after performance testing. | ||
|
||
If your site is already live, then you should run load tests on the Test environment. Keep in mind that the Test environment has one application container, while Live environments on sites with a service level of Business and above can have multiple application containers serving the site. So try to run a proportionate amount of traffic based on how many containers you currently have on your Live environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offer concrete example with math?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EOM team is the best source for the algorithm we use.
3. Determine how much load to apply. | ||
|
||
* **Performance Tests**: Smaller loads should suffice, as you should be able to see transactional bottlenecks with 10-20 concurrent users. | ||
* **Load Tests**: Determine how many concurrent users the site is expected to serve based on historical analytics for the site. Identify the peak hourly sessions and average session duration, then do some math: `hourly_sessions / (60 / average_duration) = Concurrent Users` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's reiterate difference between load test on Live vs non-live, and include app containers in calculation for scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Load tests should not be run on Test, rather performance test can/should be run there. In terms of providing formulas, it is complicated by the fact that to run "proportionate amount of traffic" on Test involves knowing the number of appservers on Live, which clients can't determine on their own (other than asking Support, or looking at New Relic, which will include decommissioned appservers for some time).
|
||
Finally, review the **Error analytics** tab in New Relic. PHP errors often indicate huge performance bottlenecks. If you have errors, fix them. | ||
|
||
### Calculating Load Capacity After Launch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we highlight this scenario? And flesh it out with concrete example explaining how to collect RPM and response time from New Relic?
## Load vs Performance Testing | ||
Before you start, it's important to understand the difference between load and performance testing and know when to use each. | ||
### Performance Testing | ||
Performance testing is the process in which you measure an application's response time to proactively expose bottlenecks. These tests should be regularly executed as part of routine maintenance. Additionally, you should run these test before any load testing. If your application is not performing well, then you can be assured that the load test will not go well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should these tests be run regularly as part of routine maintenance? To ensure performance doesn't degrade with a code or configuration change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I'd favor suggesting that clients:
- "refer to New Relic reports regularly to identify improvements or degradation of performance
- "perform performance test occasionally to proactively exposed potential bottlenecks and to identify opportunities for optimization" and to
- perform load tests in advance of anticipated major-traffic events, or prior to launching sites after major overhauls, remembering to provide enough time to fix any issues identified".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some of these notions.
* [Jmeter](http://jmeter.apache.org) | ||
* [Locust](http://locust.io/) | ||
|
||
The Pantheon onboarding team uses Locust, an open source load testing tool. Locust makes it easy to build out test scripts, and it allows you to crawl the site instead of using predefined URLs. Crawling the site has the added benefit of loading every page that is linked to anywhere on the site. This exposes edge case performance bottlenecks that would have gone undetected under tests with predifined URLs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"makes it easy" -- link to example script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EOM team should be asked to update this section.
|
||
The Pantheon onboarding team uses Locust, an open source load testing tool. Locust makes it easy to build out test scripts, and it allows you to crawl the site instead of using predefined URLs. Crawling the site has the added benefit of loading every page that is linked to anywhere on the site. This exposes edge case performance bottlenecks that would have gone undetected under tests with predifined URLs. | ||
|
||
Ultimately, it doesn't matter what tool you use as long as you to test your site properly. Be sure to allow for any authenticated traffic as well as anonymous. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Be sure to allow for any authenticated traffic as well as anonymous" - Not sure we should just assert this in passing. Load testing authenticated users can be difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that authenticated user testing is a complex task and thus the generic statement should be along the lines of "It is important for Load Testing to test against the anticipated traffic patterns of the site, both in terms of traffic volume and authenticated/anonymous proportion. Note that testing authenticated workflows is considerably more complex requiring more time, skills and iterations."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I edited
|
||
3. Determine how much load to apply. | ||
|
||
* **Performance Tests**: Smaller loads should suffice, as you should be able to see transactional bottlenecks with 10-20 concurrent users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 10-20? A single request can give you all you need, no?
We should explain how to use tools like Google Dev Tools for website performance optimization or at least link to resources like:
https://hpbn.co/
https://www.udacity.com/course/website-performance-optimization--ud884
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, you want to generate more than single request to tease out potential bottlenecks.
Also, I know that we have a Quicksilver example that will use free loader.io account to automatically run this level of test on each push to Test environment. Not only does this result in automated testing procedures, it provides a standard profile that you can see in New Relic. Here's a related link, but we need better: pantheon-systems/quicksilver-examples#110
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this is good to go (i.e. no edits needed). A separate issue should be created, if/when we want to include reference to the loader.io Quicksilver example.
|
||
High-performance is the ability to deliver a page in under a second; scalability is the ability to deliver that page in under a second for many requests. It's important to understand the difference between these two dimensions and that there are trade-offs between performance and scalability. | ||
|
||
## Verify Varnish is Working |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like verifying Varnish is working is still important before doing a load test? Maybe this can be more concise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still the case now that Global CDN is in place?
We're going to deploy this as an iterative improvement and circle back to address suggestions not implemented here. See #2251 to track |
Closes #1851
Replaces #2109
Effect
PR includes the following changes:
Todo: