-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New sprite compressor #5627
base: upcoming
Are you sure you want to change the base?
New sprite compressor #5627
Conversation
src/decompress.c
Outdated
void DecodeLOtANS(const u32 *data, u32 *readIndex, u32 *bitIndex, struct DecodeYK *ykTable, u8 *symbolTable, u8 *resultVec, u32 *state, u32 count) | ||
{ | ||
u32 currBits = data[*readIndex]; | ||
for (u32 currSym = 0; currSym < count; currSym++) | ||
{ | ||
u8 symbol = 0; | ||
for (u32 currNibble = 0; currNibble < 2; currNibble++) | ||
{ | ||
symbol += symbolTable[*state] << (currNibble*4); | ||
u16 currK = ykTable[*state].kVal; | ||
u16 nextState = ykTable[*state].yVal << currK; | ||
nextState += (currBits >> *bitIndex) & ((1u << currK)-1); | ||
if (*bitIndex + currK < 32) | ||
{ | ||
*bitIndex += currK; | ||
} | ||
else if (*bitIndex + currK == 32) | ||
{ | ||
*readIndex += 1; | ||
currBits = data[*readIndex]; | ||
*bitIndex = 0; | ||
} | ||
else if ((*bitIndex + currK) > 32) | ||
{ | ||
*readIndex += 1; | ||
currBits = data[*readIndex]; | ||
u32 remainder = *bitIndex + currK - 32; | ||
nextState += (data[*readIndex] & ((1u << remainder) - 1)) << (currK - remainder); | ||
*bitIndex = remainder; | ||
} | ||
*state = nextState-64; | ||
} | ||
resultVec[currSym] = symbol; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the tANS magic happens.
And where the biggest question marks are.
I haven't looked into the performance yet at all, but I was wondering if you've done any comparisons between large LZ loads and large smol loads? Off the top of my head, I'd expect warps and connections to be quite heavy? A warp loads up to 32kB of tiles (primary + secondary), and a connection loads up to 16kB of tiles by default (secondary with I'm not sure if there's any source-available and/or open source games out there with graphics that are more complex and detailed than vanilla, but it would be good to benchmark against those if possible. Tilesets and battle backgrounds are the first things that come to mind for potentially-significant differences in fidelity between GF and the community. EDIT: Just to be clear, the exact cycle counts don't matter except insofar as they affect the frame counts. If a frame has (e.g.) 100k cycles that go unused then smol taking 99k cycles is free, and 101k cycles drops a frame. |
It takes quite a while for large images.
With tANS
Most of the time is spent doing symbol decoding, which makes sense, because it's a lot of symbols. |
Thanks for the numbers! Which image is that you're using? For reference a frame has 280896 cycles, so:
It's possible that things could take an extra frame due to the overhead of everything else that goes on. Note that, e.g. the v-blank handler is a tax on every frame, and on the OW for connections a fair chunk of cycles may be spent on preparing the OAM buffer. I'm sure it'll be possible to speed up the decode implementation somewhat. I'm not confident that a 5x speed-up is available, but it's not unheard-of :) I suppose even if it's not possible to match LZ speeds we can always have smol as opt-in for the files which have a noticeable performance impact. That way downstream users have a way to trade performance for space if they reach a point where they've used up all 32MB. |
I'm using |
It's spending ~130 cycles per nibble doing tANS decoding, I can probably bring that down a bit by just optimizing the workflow a bit. But we're approaching the point where the raw ASM needs to be looked at. |
Description
Improved compression algorithm for 4bpp images on the GBA.
It's utilizing a modified LZ compression scheme along with an entropy encoding called tabled Asymmetric Numeral System (tANS).
Better and more through explanation to come.
People who collaborated with me in this PR
A lot of people.
@mrgriffin and @tertu-m who has answered questions about the GBA hardware whenever I had them.
Feature(s) this PR does NOT handle:
Config option to switch automatically, currently images must be manually set to
.4bpp.smol
Things to note in the release changelog:
Discord contact info
hedara