-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ideas for improving compression ratio? #7
Comments
That's true, but OTOH I'm not sure it would have significant compression benefits either.
I don't think this would help much, if at all.
I believe I had tried this without much success.
IIRC it's a quadratic-time dynamic programming algorithm, and I think the best you can do when you have both lower and upper bounds on symbol lengths. At least, it's the best I am aware of ;)
I had initially tried exactly this, but I never found a way to make it work, so I went for the dynamic programming approach above :)
I'm not sure how you would SIMDfy that - the current shuffle lookup logic relies on there being at most 16 values to lookup for any given vector, which is not the case here.
Yes, that would indeed be an option - I don't know how useful it would be though.
The simplest idea that comes to mind is to try to do LZ77 copies from the same pixel in the row above, which might work reasonably well in practice (probably mostly useful in combination with predictor 0 and on nonphoto content). I can't think of other ways to do LZ77 without significant encoding speed drops. |
Thanks for the response! :)
Would it be easier if you removed the lower bound? (it's always 1, except for the length symbols to avoid clobbering the 16-239 symbols, but there's an easy workaround for that)
You'd still have 3x16 values, with symbols split in the same way, just that the low/high symbols can also exceed 8 bits, and the mid can exceed 8+4 bits. The AVX-512 two vector
Yeah, I had also thought about limited searches around neighboring pixels, but thought it could be less useful after filtering. But since filtering needs to be per line, perhaps it could help with intra-line variations. |
I am not sure, the standard coin-collector algorithm only works if the length limit is global and not per-symbol, but I don't remember if it can be adapted. That said, IIRC there should be a lower limit of 4 on the middle values, and they should use proper symbol counts ... I don't remember if there was a specific reason not to do that.
Then I'm not sure I understand what you're proposing exactly. IIRC the important restriction here is that every chunk of 16 consecutive symbols in the middle are has the same leading bits, and 0000 to 1111 for the lowest bits. This allows having just 3x16 tables to look up into. I don't think there's any hard requirement for the middle part to all have the same length, except for potential shenanigans with canonical codes (but I think it might be enough to guarantee that all the symbols in the middle part have different lengths from every other symbol)
Yep, probably filtering counteracts things somewhat. However, it might in some cases be useful to do predictor 0 + something like trying to LZ77 from pixel offsets, say, |
Any length chosen would automatically get +4 due to appending the lower nibble of the symbol, so I'm not sure why there should specifically be a lower limit for the upper nibble?
If they all have the same length, the canonical code assignment guarantees that all codes are sequential, which means they could actually be computed from the base value (for that length) without strictly needing to meet those requirements.
Wouldn't the SUB predictor almost always be superior to NOOP? If the pixels are identical such that LZ77 works, applying the SUB filter doesn't change that, and it should be more beneficial elsewhere. I used spng to experiment with what zlib can do with various settings, so to give a rough idea of what may be possible.
Considering the gains just from zlib's Huffman/RLE makes me wonder that, with the right image, there can be noticeable improvements without adopting more-than-RLE LZ77. |
I think I might have confused myself :) do you propose to still force same bit lengths for "middle" symbols that share the upper nibble, but relax the maximum length restriction for them? I'm not entirely sure that it would give significant benefits compared to leaving the max-8 bit length. I think just relaxing the restriction that is present today of all middle symbols being equally long should be possible basically just by adding a variable length shift and possibly a table, or not?
I think that wouldn't be the case when, for example, the row below is offset by 1 pixel: subtracting TOP basically makes the row equivalent to the residuals of LEFT on the previous row, no?
The gains from Huffman/RLE are quite large - I wonder where exactly they come from, is there a reasonable way to inspect the zlib bitstream to look at Huffman tables? |
Yes - all the middle symbols would still have the same length.
I don't think so either. The main point was to allow the usage of the usual Huffman table construction, instead of the current quadratic time algorithm.
I can't think of a relatively efficient way to do it. The problem is that the canonical codes for 16, 32, 48 etc could be very different, if they're not the same length, so determining the code for, e.g. symbol 34, is a little tricky.
If we have something like:
Then TOP probably wouldn't be that useful, but SUB (or LEFT) wouldn't break the duplication (other than for the first pixel perhaps), so LZ77 would still work.
I imagine it might be easier to step through a debugger than decode the bitstream. I might try that a bit later. |
On Wed, 29 Jun 2022 at 14:42, Anime Tosho ***@***.***> wrote:
do you propose to still force same bit lengths for "middle" symbols that
share the upper nibble, but relax the maximum length restriction for them?
Yes - all the middle symbols would still have the same length.
I'm not entirely sure that it would give significant benefits compared to
leaving the max-8 bit length.
I don't think so either. The main point was to allow the usage of the
usual Huffman table construction, instead of the current quadratic time
algorithm.
Does that take a significant amount of time? Last I tried, the number of
symbols was not big enough for this to be a significant fraction of
runtime.
I think just relaxing the restriction that is present today of all middle
symbols being equally long should be possible basically just by adding a
variable length shift and possibly a table, or not?
I can't think of a relatively efficient way to do it. The problem is that
the canonical codes for 16, 32, 48 etc could be very different, if they're
not the same length, so determining the code for, e.g. symbol 34, is a
little tricky.
I think that wouldn't be the case when, for example, the row below is
offset by 1 pixel: subtracting TOP basically makes the row equivalent to
the residuals of LEFT on the previous row, no?
If we have something like:
abcdef
bcdef
Then TOP probably wouldn't be that useful, but SUB (or LEFT) wouldn't
break the duplication (other than for the first pixel perhaps), so LZ77
would still work.
It might be worth to try both I guess - the question is how much it is
possible to do without significantly harming speed, and whether it would be
a good idea to allow to set an "effort" flag...
… is there a reasonable way to inspect the zlib bitstream to look at Huffman
tables?
I imagine it might be easier to step through a debugger than decode the
bitstream. I might try that a bit later.
—
Reply to this email directly, view it on GitHub
<#7 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOPAI6TKKSPMWTZF2UIPGLVRRACTANCNFSM52B3MTFA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Depends on the image I suppose. With the current parameters, not so much, but if you increase the number of symbols covered, and expand the maximum length from 8 to 15, it ends up taking most of the time. I made this experiment which removes all restrictions on the Huffman tree except that symbols 16-239 must have the same length. The tree build is very unoptimised, so it ends up being quite slow, but I'm hoping the speed hit during the image encode isn't too bad.
A 'compression level' flag sounds like it could indeed be useful. |
What would be the motivation for doing either of those?
|
Looking at the package-merge algorithm, it seems like symbol specific lengths can easily be implemented by introducing the length-limited symbols during later iterations of the algorithm. On the other hand, keeping the mid lengths to be the same is more challenging. One thing I noted is that the 14x16 symbols can also be grouped as 7x32. If the symbols are to have the same length, then there needs to be something to pair the 7 virtual symbols (giving 8 leaf nodes, which can be arranged into a 3 level balanced binary tree). Use the average length for the 7 mid symbols, yielded from the above package-merge approach - this will be the 'ideal' length for the mid symbols. Then construct a regular unbounded Huffman tree, go through all nodes that can be put at the same level as the mid symbols, and find the one with closest probability to total mid count divided by 7. Speed hit isn't too bad overall, and can lead to some minor improvement:
The regressions are interesting though - this seems to be from the chosen sampling, because if I change it to always sample the whole image, it always seems to improve:
Perhaps splitting the image into multiple deflate blocks can get it closer to zlib's Huffman-only. |
fpnge is pretty fast, so I was wondering what, in theory, could be done to improve compression slightly without sacrificing too much speed. I was hoping you might have some ideas on the topic (regardless of if they'd ever be implemented).
Thoughts I had:
RLE
Currently only full vectors of zeroes are candidates for RLE.
Unfortunately due to how the Huffman table is currently constructed, the LZ77 symbols always get assigned rather long sequences, making RLE unattractive (particularly with 0s) unless it's a rather long sequence. Which leads to the next point.
Huffman encoding
The current implementation doesn't consider symbol counts for many symbols, as the Huffman table has to be specially crafted for the algorithm used. Trying to relax some length restrictions often leads to
ComputeCodeLengths
becoming rather slow - from what I can tell, it appears to be finding the optimal tree via a brute-force approach. Would you happen to know of a faster algorithm for this?The other idea is to remove almost all length restrictions, which allows normal optimal Huffman tree algorithms to be used. I can't think of a quick way to allow the middle section (symbols 16-239) to have differing bit lengths, so you'd need to manipulate with symbol counts there a bit to ensure they all end up getting the same bit length. Otherwise, the unrestricted Huffman code could be handled with some slowdown, as, in addition to shuffle-lookup for the low bits, you'd need to do it for the high bits as well.
Another thing to consider might be having multiple deflate blocks and sampling the image at various points, to be able to adapt to changes which occur vertically down the image.
I suspect a more optimal Huffman code might generally give a low single-digit percentage gain most of the time, but at some speed cost.
LZ77
It seems that there's a fair bit of scope here to improve compression through better match finding, but this is also often the slowest part of deflate implementations. I can't think of a way to implement this without dropping to scalar code either.
Furthermore, I'm not sure that knowing the data is an image, provides much benefit over generic LZ77 implementations - except that perhaps you can only look for matches on pixel boundaries instead of for every byte.
Any ideas here?
The text was updated successfully, but these errors were encountered: