-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial support for Cache-Control's no-cache directive #4073
base: master
Are you sure you want to change the base?
Conversation
There was one waiting list quirk I was aware of that I completely forgot: the safety net for Since the default ttl+grace+keep is zero upon entering It is debatable whether we want to keep the artificial life time: on one hand it buffers backend errors for a little while, on the other hand it prevents new fetch attempts for a little while for epiphenomenonal backend errors. Either way, the waiting list is no longer a problem, see 5bfcd0f. |
#4085 added a minor merge conflict, I will try to deal with it before bugwash. |
23fd38b
to
8ed6d3d
Compare
After a first offline review with @bsdphk there are several things to change:
The third point is not a strict requirement, up for debate. If we keep the the safety net, we should prune its |
I'm looking at the waiting list coverage that is failing in CI, I was able to reproduce it locally with some load. |
This might be a bug in varnishtest that I don't understand yet. When a timeout happens, the server spec ends before the final I made the following change: --- bin/varnishtest/tests/r02422.vtc
+++ bin/varnishtest/tests/r02422.vtc
@@ -7,7 +7,7 @@ server s1 {
rxreq
# wait until the new version is ready
- delay 1
+ loop 10 {delay 0.1}
txresp -hdr "Etag: 6"
} -start
And the logs for
So |
I found the culprit, the test became racy and |
I added 5 patches at the beginning of the series to help future troubleshooting and close the race before making it too easy to trigger. |
Never mind, I only made the problem harder to reproduce on my end with better synchronization, there is wrong linked to bringing back the exponential rush for hits. |
The problem was that the exponential rush of hits was introduced without safeguards, allowing requests to rush each other in turn when a waiting list hit was followed by a VCL restart. In r2422 the clients c5 and c6 would restart upon a regular hit and they are both supposed to enter c4's waiting list. However, if one of them "lost" the race against c4, it would fail a third fetch and rush the other one that in turn would fail a fetch and rush the other than meanwhile entered the waiting list. Until one of them would exceed The fix introduces a fence between clients entering a waiting list and objcores rushing the waiting list. The objcores can only rush clients that entered before their initial rush. |
Rebased onto master to resolve minor conflict with #4109 (touching adjacent lines in |
a31c916
to
7e4bc71
Compare
This reduces a great deal of spurious activity on the private_oh waiting list and removes unncessary conditionals when dealing with cacheable (or at least searchable) objects.
The OC_F_BUSY flag is used to decide whether to disembark a request into an objhead's waiting list. Therefore, it makes sense to only give this flag to objcores for which the outcome may be a shareable object. This excludes synthetic and private (req.body and passed req) objcores, and objects inserted with HSH_Insert(). Inserted objects are never busy since they come already ready to use. Pass transactions don't need this flag either. The OC_F_BUSY flag was somewhat conflated with struct busyobj in the fetch state machine. We don't need OC_F_BUSY to convey that a fetch task is coordinated with a req task, oc->boc is already doing that. The OC_F_BUSY|OC_F_PRIVATE bits should be considered an oxymoron since OC_F_BUSY drives waiting list activity and OC_F_PRIVATE is for objects that are not shareable by definition.
This is not a nice caching outcome. In fact, when a fetch fails we don't know the outcome. In that case we need to rush one request that will trigger a new fetch and hopefully complete with a proper outcome to rush other requests accordingly. As far as lookup is concerned there is no point in keeping the OC_F_BUSY flag so in the case where a failure happens before a proper HSH_Unbusy() it can be dropped before the rush. This would otherwise happen once the last objcore reference is dropped. One is held by the current fetch task and the other one by the client task that triggerred the fetch. It would amount to an unnecessary delay. The OC_F_FAILED flag will prevent it from being looked up, so there is no reason to delay the rush until the final objcore reference is dropped. This also means that once the client task that triggered the fetch task resumes it can only be operating on a non-busy object.
The BOS_PREP_STREAM state is retired. It was used as a stop-gap to "unbusy" objects before waking up the client task for good. Instead, the HSH_Unbusy() and HSH_Fail() functions are called once the gap is closed, depending on the outcome. This removes one unnecessary signal and consolidates multiple call sites, ensuring that objects always drop the OC_F_BUSY flag from a central location, for fetch tasks.
This is the counterpart of HSH_Unbusy() for the cases where the req task will not schedule a fetch task, at which point we know we can already wake up a request, if there is one on the waiting list. This encapsulates the "wake up only one request" semantic using a new OC_F_WITHDRAWN flag. Although this flag is not being used yet, it will when the state of the objcore determines the rush policy instead of the call site. When a bgfetch is withdrawn, the extra rush becomes redundant with the policy rush on the hit objcore. There will eventually be no more rush for the hit objcore, which will remove the redundancy just introduced. In order to withdraw objcores in a single critical section, a function HSH_DerefObjCoreUnlock() is added.
Rushing the waiting list because an objcore was dropped was a source of spurious wakeups, including the case where the dropped objcore is not even relevant to the objhead waiting list. It is now guaranteed that an object either dropped its OC_F_BUSY flag once ready (otherwise failed or withdrawn) or never had this flag to begin with for objcores that can't be looked up.
Operating on an objcore allows to preserve the different rush strategies that existed so far: - rush one request for failed fetch tasks - rush one request for withdrawn objects - rush rush_exponent requests otherwise For the cases where no rush is expected, a null objcore is passed. For cacheable objects, all waiters are temporarily woken up at once, until waiting list matches are handled separately.
From now on, the policy will be derived from the objcore.
If a fetch succeeded and an object was inserted as cacheable, it means that the VCL did not interpret the backend response as revalidated and shareable with other clients on the waiting list. After changing the rush to operate based on objcore flags instead of the call site deciding how many clients to wake up, the same objcore reference is now passed to requests before they reembark a worker. This way when an objcore is already present during lookup we can attempt a cache hit directly on the objcore that triggered the rush, removing a degree of uncertainty in the waiting list behavior. Instead of repurposing the req::hash_objhead field into an equivalent req::hash_objcore, the field is actually removed. In order to signal that a request comes back from its waiting list, the life time of the req::waitinglist flag is extended until cnt_lookup() is reentered. But to avoid infinite rush loops when a return(restart) is involved, the flag is turned into a req::waitinglist_gen counter to prevent objcore rushes from affecting requests entering the waiting list after they began for a given objcore. If the rushed objcore matches a request, the lookup can result in a hit without entering the regular lookup critical section. The objhead lock is briefly acquired to release req's reference on the objhead, to rely solely on the objcore's objhead reference like a normal hit, and generally perform hit-related operations. This change brings back the exponential rush of cacheable objects briefly neutered. This shifts the infamous waiting list serialization phenomenon to the vary header match. Since most spurious rushes at every corner of objhead activity are gone, this change puts all the spurious activity on the incompatible variants alone instead of all objects on more occasions. If a cacheable object was inserted in the cache, but already expired, this behavior enables cache hits. This can be common with multi-tier Varnish setups where one Varnish server may serve a graced object to another, but true of any origin server that may serve stale yet valid responses. The waiting list enables a proper response-wide no-cache behavior from now on, but the built-in VCL prevents it by default. This is also the first step towards implementing no-cache and private support at the header field granularity. fgen
If a synthetic beresp is inserted into the cache with no TTL, grace or keep at all, it is now considered validated (although not in the best circumstances) and will be passed to all requests in the waiting list.
This centralizes the rules regarding BOC state changes and whether to inform clients of those changes. For example, BOS_REQ_DONE is only needed by grace hits, saving one broadcast on the BOC condvar on misses and straight passes. It also formalizes when a busyobj will release its client request.
To ease test coverage for waitling list operations, requiring less disembarked requests for specific cases. Added is a test case showing that a rush miss propagates further wake ups.
1dd77c2
to
9115bb7
Compare
Force push:
|
This is the third iteration of this change, following a first iteration in #3992 that I spontaneously replaced with a second iteration in #4032 that received very helpful reviews.
In this new installment I took the design from #4032 and moved one more step in that direction.
The first important change is the strict semantics of the
OC_F_BUSY
flags. Its main purpose is now to inform the decision to disemnbark a request into an objhead waiting list. It still applies in other places to identify objects meant to be looked up for which the caching outcome is not known yet.As a result, this flag is only ever raised in
hsh_insert_busyobj()
.In a couple places, this flag was checked as a means of coordination between a client and a fetch tasks. This is the role of a
boc
to coordinate these tasks, so theboc
grew the responsibility of (indirectly) clearing theOC_F_BUSY
flag in the fetch task.In that regard, there are now stronger guarantees around this flag:
beresp
is ready for streaming (ifberesp.do_stream
)beresp
fetch completes (if! beresp.do_stream
)beresp.do_stream
HSH_Insert()
, not involving a fetchThe convergence of all paths towards consistently dropping the
OC_F_BUSY
flag as soon as there is a caching outcome or lack thereof allows a finer grained waiting list activity (much less spurious wake ups) with lower latency.This should result in overall less locking on objheads.
A request rushed reembarks the lookup step with the objcore that just dropped its busy flag. This means that we can perform a cache hit before entering the large lookup critical section if the object is shareable and matches the request. Even if the objcore was cached stale, it can now be served to all waiting clients. This enables a response-wide no-cache behavior (by opposition to per-header) that is currently prevented by the built-in VCL, but now in the realm of the possible.
A little bit of trivia: the exact same test case covering the partial no-cache support exists in all iterations of this change.
Being able to serve stale (but valid) waiting list hits solves the serialization problem for the most part. It shifts the problem towards incompatible variants, which creates spurious wake ups that are compensated by all the ones that were eliminated. It makes the problem much worse when there are a lot of variants, for example with a
Vary: User-Agent
header (that should be a reason to make them uncacheable).In that case the
vary_notice
parameter can help diagnose such scenarios. A newvary_limit
parameter could also be added as a mitigation, to result in pass transactions above that threshold. This is outside of the scope of this patch series, since this problem already exists with very similar detrimental effects.