-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ETag headers for served javascript files #39251
Conversation
lib/httplib/httpheaders.go
Outdated
// SetEntityTagCacheHeaders tells proxies and browsers to cache the content | ||
// and sets an ETag based on teleport version which can be used to check for modifications | ||
func SetEntityTagCacheHeaders(h http.Header, version string) { | ||
h.Set("Cache-Control", "max-age=300") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h.Set("Cache-Control", "max-age=300") | |
h.Set("Cache-Control", "no-cache") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context of that overflow post seems to be around index.html which already has a no-cache, no-store, must-revalidate
header sent. If we do no-cache
here, it will just be more network requests to the server. Totally fine to do it, just thought i'd try to cut back on a few.
however, after writing this, its probably more safe to always revalidate and eat a couple more network requests here. Will update, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about no-cache
when we were talking about caching on Slack. It should be fine here.
I wanted to check how other websites do this, but pretty much everyone does hashes in filenames instead so they can just cache it as immutable.
lib/web/cachehandler.go
Outdated
func makeCacheHandler(handler http.Handler, etag string) http.Handler { | ||
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
httplib.SetCacheHeaders(w.Header(), time.Hour*24*365 /* one year */) | ||
// for javascript assets, we want to set an entity tag based on the version | ||
// of teleport. This will allow us to cache our "app.js", and validate if the file | ||
// should be recached based on the teleport version. A webasset will not change within | ||
// the same version so we can use this etag to return a 304 (Not Modified). This results in | ||
// only a couple hundred bytes response compared to the entire bundle js file | ||
|
||
// The rest of our assets like fonts can be cached between versions without issue | ||
if filepath.Ext(r.URL.Path) == ".js" { | ||
httplib.SetEntityTagCacheHeaders(w.Header(), etag) | ||
} else { | ||
httplib.SetCacheHeaders(w.Header(), time.Hour*24*365 /* one year */) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If none of our resources are changing names between versions now (correct me if I'm wrong), then let's just use the same ETag mechanism for all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely an option. I kept the old "cache forever" type of mechanism for other webassets because they are mostly fonts and I don't think we'd ever really need to revalidate those anyway so it just saves an extra network request. I won't die on that hill tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about css?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do js
, css
, and wasm
here and let the rest cache forever. i don't think svg and png will change much (at least, ive never changed an svg and kept the name. usually its a different svg from my anecdotal evidence). And fonts themselves won't ever change.
Or we can make it simple and reverse the logic and say "everything is etag except woff
, woff2
, and ttf
. I'm a fan of the first more but would love your input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point is that we have to update css
(and wasm
, good catch!) along with js
, for correctness. I prefer the latter option where we default to correctness, and only optimize on specific files/types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roger. Will do the ETag on all and only optimize fonts. If we find something later can be optimized, we'll add it then. Appreciate the feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Looks like after the initial fetch, our "homepage" sends only ~3kb of webasset data (stuff that isn't fetched json like resources
). and nearly all of that 3k is the index.html page (1.8k suprises me but lets move on).
The stuff that does come over the wire is 800b of config.js
which isn't really an asset and is used similar to a fetch request of info so, we can ignore that too
So about 250b for each "not modified" response so, app.js
, index.css
, and thats it.
Really what we are looking at is 500b and the rest is stuff that was already coming through that way before the change. not bad.
lib/httplib/httpheaders.go
Outdated
// SetEntityTagCacheHeaders tells proxies and browsers to cache the content | ||
// and sets an ETag based on teleport version which can be used to check for modifications | ||
func SetEntityTagCacheHeaders(h http.Header, version string) { | ||
h.Set("Cache-Control", "max-age=300") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about no-cache
when we were talking about caching on Slack. It should be fine here.
I wanted to check how other websites do this, but pretty much everyone does hashes in filenames instead so they can just cache it as immutable.
lib/httplib/httpheaders.go
Outdated
// and sets an ETag based on teleport version which can be used to check for modifications | ||
func SetEntityTagCacheHeaders(h http.Header, version string) { | ||
h.Set("Cache-Control", "no-cache") | ||
h.Set("ETag", version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a weak etag I think. The version number alone does not guarantee that the asset is byte-for-byte identical.
lib/web/apiserver.go
Outdated
// we will set our etag based on the teleport version | ||
etag := teleport.Version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the version alone as the ETag will bite people during development. I can guarantee that people will forget about it, run make build/teleport
followed by starting the cluster and they will wonder why they don't see changes in their assets or JS code.
I wonder if we shouldn't use Last-Modified
instead. Everything under /web/app
is served from cfg.StaticFS
, which, if I understand correctly, gets created during compilation based on webassets
.
Since we know that this fs always includes the index.html
file (at least I assume so by looking at the contents of those two dirs, teleport
and e/teleport
), we can do cfg.StaticFS.Open
to open the index.html file and then stat it for its last modified time. And preferably fetch that last modified time only once, which could be done within web.NewHandler
.
Because the fs gets created during compilation, it is guaranteed to be different between compilations.
Hell, even better, we could incorporate last modified time into our ETag. This would solve the problem with development and guarantee that the timestamp is different between versions, even if we happen to compile two different production versions at the same exact time.
See also https://www.fastly.com/blog/etags-what-they-are-and-how-to-use-them which mentions ETag vs Last-Modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are using a combination of last modified in our etag, then we shouldn't need to use the weak directive. at least as far as i can tell? the whole point of using last modified is to say "this file definitely hasn't changed if the last modified is the same" and if we are incorporating it into our etag, then it is strong and not weak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last-Modified
is only about the modification time though, it doesn't say anything about the contents of the file itself. https://developer.mozilla.org/en-US/docs/Web/HTTP/Conditional_requests#validators We would be fine with strong ETag if we were indeed looking at the response we're serving.
Tangent incoming: I saw people use ETags in situations where the handler for the request computes the response, then they hash the response and add that as a strong ETag. Here's an example I found in Go. I think it works similar to how Rails does this. So in the end this just saves transfer on the network, but not necessarily caches anything on the side of the server. Alas, this is not quite what we want to use an ETag for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like getting Last Modified
time from embedded files isn't and won't be supported .
However, because this change is really only dev experience (avoid the need to cmd+shift+R
when not using the local dev server), I think this is a compromise that we can make. The solutions are
- create our own
http.FileSystem
implementation (or rather, extend it to include a compilation time) - make a timestamp when the process is started and use that as the "current" last modified. this would mean that the file would redownload after every process restart which would be a little weird for prod, but i'm not sure how often our customers are restarting their proxies
- append some sort of flag in the
make build/teleport
command that will set a compilation time somewhere that can be accessed maybe similar toteleport.Version
- maybe create a file in the webassets build that is used for "last modified" data? like a text file that includes a timestamp with the name of "last_modified" or something?
All of these seem to be much more involved than force reloading the browser so I'll write an issue and suggest we avoid larger changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the approaches you listed, adding a file with last modified date sounds most appealing to me. And good call on creating the issue.
lib/web/apiserver.go
Outdated
// we will set our etag based on the teleport version | ||
etag := teleport.Version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the approaches you listed, adding a file with last modified date sounds most appealing to me. And good call on creating the issue.
lib/web/apiserver.go
Outdated
// we will set our (weak) etag based on the teleport version | ||
etag := fmt.Sprintf("W/%s", teleport.Version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a dumb question, for dev, how do we ensure the browser doesn't cache it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normally, local dev is done with yarn dev
which starts a development server for the frontend so that isn't changed here. If you aren't using the local dev server and your changes are cached, you'll have to (for now) force reload your browser. In chrome-like browsers, you can do this by pressing cmd(or ctrl)+shift+r. I plan on adding some sort of value at build time to ensure this doesn't happen in dev as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also the discussion here. #39251 (comment)
Though Steve's comment made me realize something. When I was writing the comment I linked to, I thought of a situation where I'm working on a frontend feature, I want to check my changes in a compiled version of the Web UI, so I hit make build/teleport
and open the app again.
However, the cache will kick in even in a situation where I'm not a fullstack dev, let's say after a week of working on a feature branch I switch to master to work on another thing. I pull the changes and run make build/teleport
. If I properly understand ETag and Cache-Control: no-cache
, in this situation the cache will reuse the old version as well, unless the browser automatically cleans up old version or something.
It'll be much more common than I initially thought. Embedding even a random string within the compiled binary and appending it to the ETag would really help a lot here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The easiest way without creating more build files and what not is to just create a timestamp when the process starts and use that for the etag. This ensures that the files are up to date anytime the proxy comes online. For devs, this is no different than having a value at build time and for production this means that generally, it'll restart when someone upgrades. I'm not sure how often processes would be restart in a production environment but it can't be so often that this would mean a couple more cache updates per update cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, the cache will kick in even in a situation where I'm not a fullstack dev
yes. i may not be even doing any web-related changes at all, but I would expect to see the latest web UI when I am testing. 16.0.0-dev
could be stuck for a few months and I may not notice that I have web UI cached.
Since I reviewed this change, I can cmd+shift+r
but others may not be aware.
I won't block this change for this but we should think of something if we can. I haven't approved the PR yet as I want to double check if we can add some UT, if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are calculating some hashes, could we just implement the proper hash calculation for the etag
instead of using a weak version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't using a weak tag anymore. We are using a version+hash but i guess we don't even need the version anymore. Either way, it's a strong tag now. I think the version is a nice to have included in the string tho, so we can see at an easier glance what version is being served (in case of debugging i suppose)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vite plugin looks fine but let's move it to it's own file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, it's a strong tag now. I think the version is a nice to have included in the string tho, so we can see at an easier glance what version is being served (in case of debugging i suppose)
Gotcha 👍 . What I meant was since we were calculating, we probably can just do a hash calculation on the go handler/middleware side and cache it per file. But I really like the version with ETAG. Is the new hash file browsable/curlable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@greedy52 it is browsable at /web/apphash
2440cd3
to
dc100d7
Compare
One more double check on this one please, the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I submitted a small patch to fix the format of the header itself and not bail out if apphash can be opened but can't be read.
lib/web/apiserver.go
Outdated
|
||
hashFile, err := cfg.StaticFS.Open("/apphash") | ||
if err != nil { | ||
h.log.WithError(err).Error("Failed to open apphash file. Using version only as etag") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things that caught my attention:
- If the apphash can't be read, the ETag is technically weak. Again, it's not super important, but why not do it the proper way when we have a chance.
- The ETag page on MDN says the syntax is as follows below. We don't put the etag value in quotes.
ETag: W/"<etag_value>"
ETag: "<etag_value>"
- The log message could be improved from the perspective of a user who just sees the error and has no idea what the hash file is or what the ETag will be used for.
- If we fail to open the apphash file we fall back to the version ETag. That's reasonable. But when we fail to read the file, we bail out completely – wouldn't it make sense to fall back to the version as well?
The patch below addresses all four items.
Patch
Copy then pbpaste | git apply
to apply
diff --git a/lib/web/apiserver.go b/lib/web/apiserver.go
index 8627477a6f..ffbbcf697e 100644
--- a/lib/web/apiserver.go
+++ b/lib/web/apiserver.go
@@ -476,7 +476,7 @@ func NewHandler(cfg Config, opts ...HandlerOption) (*APIHandler, error) {
var indexPage *template.Template
// we will set our etag based on the teleport version and
// the webasset app hash if available
- etag := teleport.Version
+ etag := fmt.Sprintf("W/%q", teleport.Version)
if cfg.StaticFS != nil {
index, err := cfg.StaticFS.Open("/index.html")
if err != nil {
@@ -496,18 +496,11 @@ func NewHandler(cfg Config, opts ...HandlerOption) (*APIHandler, error) {
h.Handle("GET", "/robots.txt", httplib.MakeHandler(serveRobotsTxt))
h.Handle("GET", "/web/config.js", h.WithUnauthenticatedLimiter(h.getWebConfig))
- hashFile, err := cfg.StaticFS.Open("/apphash")
+ etagFromAppHash, err := readEtagFromAppHash(cfg.StaticFS)
if err != nil {
- h.log.WithError(err).Error("Failed to open apphash file. Using version only as etag")
+ h.log.WithError(err).Error("Could not read apphash from embedded webassets. Using version only as ETag for Web UI assets.")
} else {
- defer hashFile.Close()
- appHash, err := io.ReadAll(hashFile)
- if err != nil {
- h.log.WithError(err).Error("Failed to read contents of the hash file")
- return nil, trace.Wrap(err)
- }
-
- etag = fmt.Sprintf("%s-%s", etag, string(appHash))
+ etag = etagFromAppHash
}
}
@@ -4638,3 +4631,21 @@ func serveRobotsTxt(w http.ResponseWriter, r *http.Request, p httprouter.Params)
w.Write([]byte(robots))
return nil, nil
}
+
+func readEtagFromAppHash(fs http.FileSystem) (string, error) {
+ hashFile, err := fs.Open("/apphash")
+ if err != nil {
+ return "", trace.Wrap(err)
+ }
+ defer hashFile.Close()
+
+ appHash, err := io.ReadAll(hashFile)
+ if err != nil {
+ return "", trace.Wrap(err)
+ }
+
+ versionWithHash := fmt.Sprintf("%s-%s", teleport.Version, string(appHash))
+ etag := fmt.Sprintf("%q", versionWithHash)
+
+ return etag, nil
+}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the thread that Steve started (#39251 (comment)), could you document why we use this specific ETag format? The thread has a lot of valuable info about correctly handling a bunch of different scenarios: multiple proxies, binaries compiled at different times for different platforms, dev version, etc.
Thanks for patch, looks great. Will apply and add the comments as well. Great call on the comment because if someone were to see the tag in the future they'd probably think it was arbitrary. |
a3e06b4
to
8d30d37
Compare
* Prevent code splitting (#38706) This will prevent the react bundle from being code split. This means that our entire javascript bundle will be a single file as well as our css file. It also removes hashing from the included webassets. The reason for this change is to allow different versioned proxies to coexist behind a load balancer. If we remove code splitting, then the only time our bundle will load is on initial hit, and the web client will then have the entire bundle. This also removes React.lazy. React.lazy wrapped components in a promise to allow for dynamic imports during runtime. This is no longer needed. Lastly, to clean up the default exports used by react lazy, this removes (most) default exports for feature components. Assist and telemetry-boot are not included in the javascript bundle * Fix the import on the account settings page (#39015) * Fix auth connector imports (#39019) * Init discover resources inside the discover component (#39166) * Use ETag headers for served javascript files (#39251) --------- Co-authored-by: Bartosz Leper <[email protected]>
This supports the recent change to single chunk webassets and will be backported in the collective backport PR. We are aiming to merge it just before 15.2
This PR will add a short cache and ETag to our javascript webasset files. The purpose of the etag is for the browser to send when the cache is stale and check with the server "do i need to recache this?". We use the teleport Version for the ETag which means new content wont be downloaded unless the version is updated.
This results in much less bandwidth sending files that haven't been updated, but also allows our
app.js
to be accurate and up to date (at least every 5 minutes) without the need for hashing.For context, our current bundle size for oss is ~700kb, and the 304 Not Modified response is about 240b. Rather than downloading the file again for the cache every 5 minutes, it'll be only the 240b until the version has changed.
The 5 minute cache I selected is completely arbitrary and I'm happy for more input. I chose it because we want the cache to exist throughout the login process of the app (handling the initial load and redirects during login) but not last so long that a new version during an upgrade will be ignored. I also wouldn't want to spam the server with revalidation requests if not needed so 5 minutes is where I landed.
All of this caching can be invalidated and bypassed to get the most current
app.js
by performing a forced reload (cmd+shift+R for chrome/brave for example), and no extra steps need to be taken besides that if you feel the assets are "behind" (as in, you dont need to open settings and clear cache/history or any of that)