New sprite compressor #5627

hedara90 · 2024-10-31T20:50:55Z

Description

Improved compression algorithm for 4bpp images on the GBA.
It's utilizing a modified LZ compression scheme along with an entropy encoding called tabled Asymmetric Numeral System (tANS).
Better and more through explanation to come.

People who collaborated with me in this PR

A lot of people.
@mrgriffin and @tertu-m who has answered questions about the GBA hardware whenever I had them.

Feature(s) this PR does NOT handle:

Config option to switch automatically, currently images must be manually set to .4bpp.smol

Things to note in the release changelog:

This will be filled out.

Discord contact info

hedara

hedara90 · 2024-11-02T14:51:54Z

src/decompress.c

+void DecodeLOtANS(const u32 *data, u32 *readIndex, u32 *bitIndex, struct DecodeYK *ykTable, u8 *symbolTable, u8 *resultVec, u32 *state, u32 count)
+{
+    u32 currBits = data[*readIndex];
+    for (u32 currSym = 0; currSym < count; currSym++)
+    {
+        u8 symbol = 0;
+        for (u32 currNibble = 0; currNibble < 2; currNibble++)
+        {
+            symbol += symbolTable[*state] << (currNibble*4);
+            u16 currK = ykTable[*state].kVal;
+            u16 nextState = ykTable[*state].yVal << currK;
+            nextState += (currBits >> *bitIndex) & ((1u << currK)-1);
+            if (*bitIndex + currK < 32)
+            {
+                *bitIndex += currK;
+            }
+            else if (*bitIndex + currK == 32)
+            {
+                *readIndex += 1;
+                currBits = data[*readIndex];
+                *bitIndex = 0;
+            }
+            else if ((*bitIndex + currK) > 32)
+            {
+                *readIndex += 1;
+                currBits = data[*readIndex];
+                u32 remainder = *bitIndex + currK - 32;
+                nextState += (data[*readIndex] & ((1u << remainder) - 1)) << (currK - remainder);
+                *bitIndex = remainder;
+            }
+            *state = nextState-64;
+        }
+        resultVec[currSym] = symbol;
+    }
+}


This is where the tANS magic happens.
And where the biggest question marks are.

mrgriffin · 2024-11-07T08:27:29Z

I haven't looked into the performance yet at all, but I was wondering if you've done any comparisons between large LZ loads and large smol loads? Off the top of my head, I'd expect warps and connections to be quite heavy? A warp loads up to 32kB of tiles (primary + secondary), and a connection loads up to 16kB of tiles by default (secondary with NUM_TILES_IN_PRIMARY = 512). For connections in particular I'd be concerned about the load taking more than a frame and causing a hitch. There may be some UIs which also load large amounts of compressed data, the Frontier Pass or PokéNav region map maybe?

I'm not sure if there's any source-available and/or open source games out there with graphics that are more complex and detailed than vanilla, but it would be good to benchmark against those if possible. Tilesets and battle backgrounds are the first things that come to mind for potentially-significant differences in fidelity between GF and the community.

EDIT: Just to be clear, the exact cycle counts don't matter except insofar as they affect the frame counts. If a frame has (e.g.) 100k cycles that go unused then smol taking 99k cycles is free, and 101k cycles drops a frame.

hedara90 · 2024-11-07T09:58:21Z

It takes quite a while for large images.
No tANS

[WARN] GBA Debug:	Mode: 0
[WARN] GBA Debug:	Bitsteam size: 0
[WARN] GBA Debug:	tANS table build time: 67
[WARN] GBA Debug:	LO decoding time: 40
[WARN] GBA Debug:	Sym decoding time: 38
[WARN] GBA Debug:	Unencoded copy time: 127357
[WARN] GBA Debug:	Instruction decoding time: 535601
[WARN] GBA Debug:	Total time: 663103

With tANS

[WARN] GBA Debug:	Mode: 5
[WARN] GBA Debug:	Bitsteam size: 1883
[WARN] GBA Debug:	tANS table build time: 17702
[WARN] GBA Debug:	LO decoding time: 383654
[WARN] GBA Debug:	Sym decoding time: 2061221
[WARN] GBA Debug:	Unencoded copy time: 75
[WARN] GBA Debug:	Instruction decoding time: 369126
[WARN] GBA Debug:	Total time: 2831778

Most of the time is spent doing symbol decoding, which makes sense, because it's a lot of symbols.
For comparison, LZ took 454313 cycles for the same image.

mrgriffin · 2024-11-07T10:23:14Z

It takes quite a while for large images.

Thanks for the numbers! Which image is that you're using?

For reference a frame has 280896 cycles, so:

LZ decode is taking 1.62 frames, ~33ms.
Non-tANS decode is taking 2.36 frames, ~50ms.
tANS decode is taking 10.08 frames, ~183ms.

It's possible that things could take an extra frame due to the overhead of everything else that goes on. Note that, e.g. the v-blank handler is a tax on every frame, and on the OW for connections a fair chunk of cycles may be spent on preparing the OAM buffer.

I'm sure it'll be possible to speed up the decode implementation somewhat. I'm not confident that a 5x speed-up is available, but it's not unheard-of :)

I suppose even if it's not possible to match LZ speeds we can always have smol as opt-in for the files which have a noticeable performance impact. That way downstream users have a way to trade performance for space if they reach a point where they've used up all 32MB.

hedara90 · 2024-11-07T10:25:54Z

I'm using data/tilesets/primary/general/tiles.4bpp.
I kinda just picked the largest image that I knew where it was.

hedara90 · 2024-11-07T10:37:54Z

It's spending ~130 cycles per nibble doing tANS decoding, I can probably bring that down a bit by just optimizing the workflow a bit. But we're approaching the point where the raw ASM needs to be looked at.
E: Redoing the tANS table brought it down to ~120 cycles.

Hedara added 21 commits October 2, 2024 20:00

Testing decompression

bf4c933

Savepoint

0fc2270

Saving

76e3426

save point

8a5d3aa

Before branching to test

0fd9de7

Reading stage 1 works

5d774bc

Saving

1bb0cdf

Progress

1dc423b

Progress

c115253

It works

add3925

Minor cleanup

c6a875b

Merge branch 'upcoming' into smolCompress-test

7c2b83b

More cleanup

5d68264

Moved all calls to decompression functions into their wrapper functions

c02de1a

Removed extra line

6ea4156

Merge branch 'compression-wrapper' into smolCompress-test

e2489cf

small fix

dcf1754

Split DecompressData into V/Wram versions

33b5152

Pokemon sprites work

4534623

Merge branch 'upcoming' into smolCompress-test

5e950b9

New image compressor, first public reveal

84b41d5

hedara90 added the new-feature Adds a feature label Oct 31, 2024

Bassoonian added this to the 1.11 milestone Oct 31, 2024

Some cleanup

406ee24

hedara90 commented Nov 2, 2024

View reviewed changes

Hedara added 2 commits November 3, 2024 15:53

Fixed stuff

6cde528

plase stop using outdated gcc

fb33a2a

Some cleanup and performance increase

09596ee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New sprite compressor #5627

New sprite compressor #5627

hedara90 commented Oct 31, 2024

hedara90 Nov 2, 2024

mrgriffin commented Nov 7, 2024 •

edited

Loading

hedara90 commented Nov 7, 2024 •

edited

Loading

mrgriffin commented Nov 7, 2024 •

edited

Loading

hedara90 commented Nov 7, 2024

hedara90 commented Nov 7, 2024 •

edited

Loading

New sprite compressor #5627

Are you sure you want to change the base?

New sprite compressor #5627

Conversation

hedara90 commented Oct 31, 2024

Description

People who collaborated with me in this PR

Feature(s) this PR does NOT handle:

Things to note in the release changelog:

Discord contact info

hedara90 Nov 2, 2024

Choose a reason for hiding this comment

mrgriffin commented Nov 7, 2024 • edited Loading

hedara90 commented Nov 7, 2024 • edited Loading

mrgriffin commented Nov 7, 2024 • edited Loading

hedara90 commented Nov 7, 2024

hedara90 commented Nov 7, 2024 • edited Loading

mrgriffin commented Nov 7, 2024 •

edited

Loading

hedara90 commented Nov 7, 2024 •

edited

Loading

mrgriffin commented Nov 7, 2024 •

edited

Loading

hedara90 commented Nov 7, 2024 •

edited

Loading