-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New %codes for computing message forwarding delays #260
base: bag64a
Are you sure you want to change the base?
Conversation
There are many ways a transaction can fail. Many failures are not associated with a socket read(2). We can and perhaps should collect and report error timestamps, but doing so correctly is a lot of work _and_ does not help with monitoring primary SLA-related delays that usually focus on successful transactions (percentage of unforced errors is usually a separate metric that is not related to proxy-induced delays).
Instead, a new requestParseStart field is used for factoring in possible parsing delays. %request_first_read_time now shows the beginning of the first successful parse attempt (ignoring previous unsuccessful attempts due to insufficient data).
Also added another ReadFromClient() helper for updating statistics and timers.
Also added (missing) statCounter.client_http.kbytes_out statistics for FTP.
src/tunnel.cc
Outdated
statCounter.client_http.kbytes_in += len; | ||
} | ||
|
||
ReadFromClient(al, len, errcode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, ReadFromClient()
and other three functions minimize duplication by collecting global statistics, but do not collect local (transaction-related) statistics as well. For example, in this context the local statistics is calculated by the client.bytesIn(len)
call. In HttpStateData::readReply()
it would be probably HttpStateData::payloadSeen
variable. We have also ClientHttpRequest::out::size calculated in both HTTP and FTP code. Some of these context-related variables are used for other format codes, e.g., "<st"
is gathered from al->http.clientReplySz
, that in turn is calculated from ClientHttpRequest::out
.
In general, all these transaction-related bits seem rather unsystematic: I do not see an (easy) way how we can neatly adjust further, moving them into ReadFromClient()
(and similar) functions.
src/servers/FtpServer.cc
Outdated
const auto http = pipeline.front()->http; | ||
assert(http); | ||
http->out.size += io.size; | ||
WrittenToClient(http->al, io.size, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (and 2 cases below) adds statCounter.client_http.kbytes_out statistics for FTP.
Also this change eliminates requestReadTimer special updating outside ReadFromClient() API. Also removed requestReadTimer special handling on EOF: in this case the truncated request is invalid (aborting the transaction?). For simplicity, we do not maintain requestReadTimer (and others) if there is a higher level problem (e.g., the entire transaction is aborted).
|
||
response_last_write: The latest observed event in a series of successful response write events. See response_first_write event for the definition of a successful response write event. If the entire response was delivered in a single socket write(2), then this event is the same as response_first_write event. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR implementation differs from the proposed definitions (for request_first_read/request_last_read):
request_first_read
- valid requests: Just after request parsing starts (parseOneRequest())
- invalid requests (created by abortRequestParsing(): Same as (1)
- fake requests: When the fake request is created (buildFakeRequest())
request_last_read
- valid requests: request_first_read (for parsed complete requests) or just after it reads the last request data block from the client socket (for parsed incomplete requests)
- invalid requests (created by abortRequestParsing(): same as
request_first_read
- fake requests: When the fake request is created (buildFakeRequest())
request_first_write
Just after the first request data block is written to the peer socket (CommIoCbParams/IOCB callbacks)
request_last_write
Just after the last request data block is written to the peer socket (CommIoCbParams/IOCB callbacks)
response_first_read
Just after the first response data block is read from the peer socket (CommIoCbParams/IOCB callbacks)
response_last_read
Just after the last response data block is read from the peer socket (CommIoCbParams/IOCB callbacks)
response_first_write
Just after the first request data block is written to the client socket (CommIoCbParams/IOCB callbacks)
response_last_write
Just after the last request data block is written to the client socket (CommIoCbParams/IOCB callbacks)
We could simplify request_first_read/request_last_read calculation, ignoring parsing time and measuring them from ClientHttpRequest constructor. However, this approach would minimize request_first_read/request_last_read difference in some cases, concealing this useful information from the admin.
src/servers/Http1Server.cc
Outdated
@@ -72,6 +72,8 @@ Http::One::Server::noteMoreBodySpaceAvailable(BodyPipe::Pointer) | |||
Http::Stream * | |||
Http::One::Server::parseOneRequest() | |||
{ | |||
TimeScope parseContext(requestParseStart); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot easily simplify here (getting rid of TimeScope wrap) because requestParseStart may be used (incorrectly) if we buildFakeRequest() somewhere further (within the same clientParseRequests() iteration). In buildFakeRequest() request_first_read
should be 'now'.
These or similar improvements are needed, but they are far from trivial and deserve dedicated PR(s) that would address several problems: * inconsistent treatment of ERR_CLOSING * inconsistent treatment of read EOF * inconsistent treatment of I/O errors * inconsistent treatment of StoreEntry aborts during I/O wait * abuse of statCounter.client_http for FTP traffic? * missing statCounter.client_http.kbytes_out update in Ftp::Server::wroteReply() * basic low-level I/O stats collected by higher-level I/O handling code and/or post-TLS/decryption stats misrepresenting actual socket I/Os Bundling statCounter and ALE::cache updates also complicated ReadFromClient() API and logic, requiring an error-prone hasError parameter and special treatment of zero-size I/O (that was duplicating caller checks in some cases). At the same time, some other rather similar stats updates were not bundled (e.g., hier.notePeerWrite()). The in-scope changes "inlined" in this commit are still unpolished; their location may also change. Some are suffering from similar inconsistencies. Some are missing ALE presence checks.
Ideally, we want requestReadTimer to include the time it takes Squid to parse a fully accumulated request header. However, the proposed (and now removed) solution was problematic on several levels: * It led to virtual parseOneRequest() code duplication. * It crashed two existing ClientHttpRequest(nullptr) callers. * It contradicted existing cache.start_time initialization code in the same ClientHttpRequest constructor. It would be odd for %tS to be later than %request_first_read (that _name_ reflects neither removed nor this/new/current code and will be changed, of course). * A different solution will be needed when we start creating ClientHttpRequest objects earlier, _before_ we start parsing. And we should create those object earlier instead of accumulating information at ConnStateData level and then passing it to ClientHttpRequest later, including on parsing errors. Many other problems will be solved if we create ClientHttpRequest objects earlier!
XXX: Besides benign compilation problems, this solution is complex but probably incomplete and probably hits other/existing code problems. We should find a better approach.
Revert previous branch commit due to problems documented in that commit message.
Underlying principle: We want to capture the moment when a successful transaction(*) received all data it was waiting for -- the time of the last socket read associated with that transaction. Some of these EOF events abort the current transaction. We do not care about such events because we only care about _successful_ transactions. We (greatly) simplify code by not chasing EOF effects to distinguish successful EOF events from killer ones (see two previous branch commits for a negative example of such chasing). On a client-to-Squid side, a socket read may read data for multiple (current and/or future) transactions, but at most _one_ of them would be the current transaction waiting for those bytes. The rest are future transactions that have not even started yet (from this project events point of view); a future transaction read timer will _start_ when that transaction is created, after accumulating enough data to parse its request header. (*): Here, "transaction" is a request-response exchange (on either client-to-Squid or Squid-to-peer connection).
N.B. Legacy comm_read() callers may receive these three io.flags values: ERR_CLOSING, OK, and COMM_ERROR. No ENDFILE and INPROGRESS for them (those flags are specific to readers converted to Comm::ReadNow() API).
I am not sure zero-writes are possible, but if they are possible, I see no reason to ignore them because the transaction is still waiting for that write(2) system call.
... instead of 'request header parsed' or 'last header byte' timestamp. I think both timestamps are valuable, but we should start with "first byte" because * 'first byte' feels more "fundamental" than 'last header byte'. * 'first byte' "matches" 'last byte' timestamp this branch is adding. * 'first request byte' "matches" 'first response byte' this branch is adding. On the Squid-to-server connection, measuring 'first response byte' is easier/simpler, and it is good to start with something simpler. * 'first byte' covers header parsing overheads that are, in most cases, Squid overheads that admins want to include in SLAs and such. This commit partially restores ideas and associated problems removed in commit c0dd69b: We make this code more complex again, reintroduce a cache.start_time conflict, and add code that will have to be refactored when ClientHttpRequest objects are created earlier.
... until we can prove it (and assert it in FwdState ctor).
IME, it is best to use std::optional wrapping to indicate lack of information rather than to define information type. The difference is usually very subtle. For example, in this specific case, the only problematic areas (fixed by this commit) were MessageTimer conversion constructor (that arguably should take a known time value) and ExtractRequestStartTime() (that definitely should return a known time value rather than a never-nil std::optional). There are exceptions to this rule of thumb, of course. Sometimes, the possibility of missing information _is_ an intrinsic property of a C++ type. I do not think those exceptions apply to MessageTimer::Time.
Also formatted new %code docs.
We claim that "CONNECT tunnel establishment ... happen earlier". FwdState::successfullyConnectedToPeer() does reset the timer, but tunnel.cc (which uses Http::Tunneler) does not go through FwdState, and if feels odd to update a timer that is guaranteed to be reset anyway. Tunnel establishment handling is one of those areas where "message activity" (i.e. what we are trying to measure on this branch) differs from "socket I/O".
since doneWithControlMsg() is also used for caller notification (i.e., on unsuccessful write).
@@ -8,9 +8,6 @@ | |||
#ifndef SQUID_SRC_AUTH_BASIC_NIS_NIS_SUPPORT_H | |||
#define SQUID_SRC_AUTH_BASIC_NIS_NIS_SUPPORT_H | |||
|
|||
#ifndef SQUID_SRC_AUTH_BASIC_NIS_NIS_SUPPORT_H | |||
#define SQUID_SRC_AUTH_BASIC_NIS_NIS_SUPPORT_H | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is out of scope - it fixes a build error.
The new format codes log absolute times of an HTTP/FTP transaction stage:
These codes yield an approximate absolute time of event in
<full seconds since epoch>.<fractional seconds>
format.The times may be useful in triaging various delays caused by
Squid and/or peers.
Squid maintains these timers only for successful transactions. Most of
the timers are updated in an IO callback invoked during HTTP/FTP
data transfers.