Response
We own the service in charge of delivering header, footer, and sometimes left nav for pages across the company. Since our content on many of these pages is important for search engine optimization (SEO), and generally for improved user experience, we server-side render (SSR) our content, so it gets directly injected into those pages server-side and is visible to users without needing to wait for Javascript to execute. Each nav is an independent React app so that they can be pretty well isolated, and so that we don't depend on the host page using any particular framework. (We are used on pages that otherwise don't use React, for example.) When we hydrate on the client-side, we use webpack 5 module federation so that the different navs can share dependencies (including React, for example). (The independent deployability aspect of module federation is not relevant to us, since we already get that through our SSR service. Since we don't need that feature, we're able to enable caching for our remote entrypoint and give it a versionized path.)
There's a new version of the company's design system, which is supposed to have improved performance, in addition to new features and ongoing support. We recently addressed our last upgrade blocker. (The design system uses Emotion & Stylis for CSS-in-JS support, and the Stylis plugin we use for style isolation had not been updated to work with the upgraded version.) We'd like to upgrade, but we're fairly risk averse due to the importance and wide-spread use of our content, so we need to do it safely, and we'd like to evaluate its effect on performance. If we discover a problem, we want to be able to roll back to the old version very quickly, address those issues, and then roll forward again. We'd like to start with a small percentage of users and see how it affects performance and error rates. In short, we want to AB test it. My task is to drive the entire problem, proposing solutions, getting feedback from the team, doing the bulk of the implementation, handling the rollout, and finding & fixing any issues that crop up along the way.
I came up with a rough set of ideas & evaluation criteria and then met with a more junior teammate to bounce the ideas around. During that meeting, we briefly documented the alternatives, the criteria, and how the alternatives rated with respect to the criteria. After that initial meeting, I expanded on that doc significantly, noting many more pros & cons. I set up a meeting with the rest of the team to get feedback. Shortly before that feedback meeting, I thought of another approach that could be even better if it turned out to be feasible. I was pretty sure it was, and another team member had relevant experience, so I was able to validate it with him, and the rest of the team agreed that it would be the best way forward.
SSR & CSR need to use the same version so that hydration matches up. SSR is a single bundle for all navs, while CSR has separate bundles per nav. SSR & CSR bundles are built fairly differently, with CSR using webpack 5 module federation features in order to share dependencies when multiple bundles are used on the same page (header, footer, left nav). Ideally, we don't want to have 2 copies of the code, since we'd worry about them accidentally diverging. The best idea I had come up with earlier was to duplicate the code as part of the build process, making some programmatic changes along the way (mostly swapping the version of the design system). That would allow us to have both versions built into the server bundle and create separate client bundles for them without risking accidental divergence. My ah-ha moment was that we could split that server uber-bundle into a bundle that just handles things like routing, which would then pass the baton to a separate bundle for each nav. The result is a much smaller routing bundle, and we build both a client bundle and a server bundle for each nav. This was a big enough change to how we deliver our content that it justified its own AB test before we even started testing the upgrade! We saw an increase in SSR times, which I was then able to address with some changes to how we loaded the bundles. We have a fairly thorough regression test suite, so I was able to set that up to verify that everything (that we have automated tests for) functioned correctly and that everything looked identical down to the pixel.
In order to AB test the upgrade itself, we needed to build our nav code against both versions of the design system. I used a combination of npm
aliasing (depend on both major versions, giving them different aliases), webpack
aliasing (aliasing the appropriate major version back to the default name), and module federation sharing configuration (aliasing the appropriate major version back to the default name, plus registering the correct name & version for dependency sharing) to produce (for each nav) client & server bundles for the old version of the design system as well as client & server bundles for the new version of the design system. The change in major versions also means that the design system is not backward compatible, so I had to change our code in order to make it behave identically when run with either version. This was mostly a matter of specifying explicit arguments where defaults had changed, but it also included some bits where we detect which version is present and tweak our behavior based on the result. I updated the build to run the unit tests once with each version of the design system, and I added regression tests to verify that it looked & behaved identically.
When I ramped it up to handle 1% of users, I saw that it was 20% slower! It's a good thing we used an AB test, so this wasn't affecting all of our users. Over the course of a couple hours, the difference decreased to less than 10%. Our baseline render time was ~20ms, so an extra couple ms wasn't going to break the bank, but it certainly wasn't what we had hoped for. I talked with a stakeholder to see if they were comfortable ramping the test up. (I picked the stakeholder that I thought was most likely to have reservations.) He was okay with ramping to 5%, so I went ahead with that, and surprisingly the difference decreased to less than 1ms. I'm still not certain why that would be, except perhaps that the test groups were somehow biased, though that would be surprising, since each test group was handling 150k requests per hour. After a while, we began to see performance problems across both versions, and we discovered that our service had started leaking memory starting around the time we the AB test was first turned on. AB testing didn't save the old version from the impact of the memory leak, but it did allow us to control how fast we leaked memory. I worked with the design systems team to find a memory leak in the new version, and we were able to work around it. I tried to validate my fix locally, but I wasn't generating enough traffic to detect whether the leak was present. After merging my fix, I was able to see that the leak was fixed, and the new version of the design system was now performing ~10% better than the old version. That's more like it!
With all said & done, we now have better performance for our users across nearly all Indeed pages, we can use the features introduced in the new version of the design system, we're back on a supported version, and we were able to manage risks and avoid major incidents by using our AB testing framework to roll the change out gradually. In addition, we've already used the techniques I pioneered for this project to make other upgrades safer, and I plan to write a guide to these techniques to share across the company and help other teams perform safer upgrades.