Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Gzip_io.string_lwt #25

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 11 additions & 19 deletions gzip_io.ml
Original file line number Diff line number Diff line change
Expand Up @@ -38,25 +38,17 @@ let string s =

let string_lwt ?(chunk_size = 3000) ?(yield = Lwt.pause) s =
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've realized that Gzip_stream imposes a 1024-byte buffer size, so we should not give the illusion of control.
i.e. can use chunk_size = 4096 or whatever internally, so that there's no problem if buffer_size were changed in Gzip_stream, but not allow to pass ?chunk_size since it'll normally be ineffective.

let out = output (IO.output_string ()) in
let buff = Buffer.create chunk_size in
let len = String.length s in
let rec loop i =
if i >= len then (
(* Final flush of the buffer if there's any residue *)
if Buffer.length buff > 0 then IO.nwrite out (Buffer.to_bytes buff);
Lwt.return_unit)
else begin
let c = s.[i] in
Buffer.add_char buff c;
if Buffer.length buff < chunk_size then loop (i + 1)
else (
(* Buffer is full, write and clear it *)
IO.nwrite out (Buffer.to_bytes buff);
Buffer.clear buff;
(* Yield after processing a chunk *)
let%lwt () = yield () in
loop (i + 1))
end
let b = Bytes.unsafe_of_string s in
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ok to use unsafe_of_string here ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS it's only read from so so it's fine.

It's a bit strange IO.output expects a Bytes.t though.

let len = Bytes.length b in
let rec loop offset =
let written =
let len_to_write = Int.min chunk_size (len - offset) in
IO.output out b offset len_to_write in
if offset + written >= len then Lwt.return_unit
else (
(* Yield after processing a chunk *)
let%lwt () = yield () in
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a buffer of size 1024 we can compress maybe around 30000 chunks/s, which means yielding every ~30 us. At this point the overhead of returning to the event loop could start to become measurable. So yielding after processing around say 4096 bytes could be better. Probably not a big deal though.

(This depends on the cost of the switch, which I guesstimated at a few us...)

loop (offset + written))
in
let%lwt () = loop 0 in
Lwt.return @@ IO.close_out out
Expand Down