-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Gzip_io.string_lwt #25
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no like
- yield is a hack
- copy-paste
- gzip+chunked would be a natural way to do this
- would prefer support for other compression methods than improving gzip usage
- can just add doc note that gzip can be expensive
gzip_io.ml
Outdated
let buff = Buffer.create chunk_size in | ||
let len = String.length s in | ||
let rec loop i = | ||
if i >= len then ( | ||
(* Final flush of the buffer if there's any residue *) | ||
if Buffer.length buff > 0 then IO.nwrite out (Buffer.to_bytes buff); | ||
Lwt.return_unit) | ||
else begin | ||
let c = s.[i] in | ||
Buffer.add_char buff c; | ||
if Buffer.length buff < chunk_size then loop (i + 1) | ||
else ( | ||
(* Buffer is full, write and clear it *) | ||
IO.nwrite out (Buffer.to_bytes buff); | ||
Buffer.clear buff; | ||
(* Yield after processing a chunk *) | ||
let%lwt () = yield () in | ||
loop (i + 1)) | ||
end | ||
in | ||
let%lwt () = loop 0 in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to process the input string char by char (slow!), accumulate the data to be compressed in a buffer, then allocate a string corresponding to the data (ouch!). It should be possible to write substrings (not allocated, but with offset + size) from the original string to the IO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I updated the code in ba084d4
(#25)
let%lwt () = yield () in | ||
loop (i + 1)) | ||
end | ||
let b = Bytes.unsafe_of_string s in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ok to use unsafe_of_string
here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS it's only read from so so it's fine.
It's a bit strange IO.output
expects a Bytes.t
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this works, I've since then learned that httpev
does chunked transfer encoding with Chunks
, so it is possible to stream the compressed data without allocating for the whole compressed response (at least twice as much allocation as the size of the end result). This cannot be built on top of Gzip_stream
though -- essentially the logic from Gzip_stream.output
would have to be replicated and the output written to the function passed to the generator.
Since the plan is to not compress in httpev
though, I'd say we can do with what you wrote so far :-)
let%lwt () = yield () in | ||
loop (i + 1)) | ||
end | ||
let b = Bytes.unsafe_of_string s in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS it's only read from so so it's fine.
It's a bit strange IO.output
expects a Bytes.t
though.
@@ -36,6 +36,23 @@ let string s = | |||
IO.nwrite out (Bytes.unsafe_of_string s); (* IO wrong type *) | |||
IO.close_out out | |||
|
|||
let string_lwt ?(chunk_size = 3000) ?(yield = Lwt.pause) s = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've realized that Gzip_stream
imposes a 1024-byte buffer size, so we should not give the illusion of control.
i.e. can use chunk_size = 4096
or whatever internally, so that there's no problem if buffer_size
were changed in Gzip_stream
, but not allow to pass ?chunk_size
since it'll normally be ineffective.
if offset + written >= len then Lwt.return_unit | ||
else ( | ||
(* Yield after processing a chunk *) | ||
let%lwt () = yield () in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a buffer of size 1024 we can compress maybe around 30000 chunks/s, which means yielding every ~30 us. At this point the overhead of returning to the event loop could start to become measurable. So yielding after processing around say 4096 bytes could be better. Probably not a big deal though.
(This depends on the cost of the switch, which I guesstimated at a few us...)
(* TODO do not apply encoding to application/gzip *) | ||
(* TODO gzip + chunked? *) | ||
match body, code, c.req with | ||
| `Body s, `Ok, Ready { encoding=Gzip; _ } when String.length s > 128 -> ("Content-Encoding", "gzip")::hdrs, `Body (Gzip_io.string s) | ||
| _ -> hdrs, body | ||
| `Body s, `Ok, Ready { encoding=Gzip; _ } when String.length s > 128 -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We allocate a 1024-byte buffer in Gzip_stream
, so maybe a higher threshold makes sense (it feels a bit silly to go ahead and allocate 1K for only 129 bytes of response).
Compressing body content of a big size can block the event loop, preventing other operations from proceeding concurrently. This PR introduces a
Gzip_io.string_lwt
function which should handle compression of large string data without blocking the event loop.reference: https://git.ahrefs.com/ahrefs/monorepo/pull/18094#issuecomment-37750