Implementation of Q6_KFloatTensor #12

srogmann · 2024-08-10T22:30:56Z

This PR contains a Q6_K implementation.

Model: bartowski/Meta-Llama-3.1-8B-Instruct-GGUF, Q6_K
CPU: AMD Ryzen 9 7900X
JVM: OpenJDK 64-Bit Server VM
Linux: 6.9.7-arch1-1

Quant	Species	Speed
Q6_K	S_128_BIT	0.22 tokens/s
Q6_K	S_256_BIT (non-array)	0.47 tokens/s, 0.10 tokens/s
Q6_K	S_256_BIT (array)	1.26 tokens/s
Q6_K	S_256_BIT (512 bits)	0.29 tokens/s

Model: bartowski/Meta-Llama-3.1-8B-Instruct-GGUF, Q8

Quant	Species	Speed
Q8_0	S_128_BIT	4.02 tokens/s
Q8_0	S_256_BIT	5.80 tokens/s

mukel · 2024-08-12T07:54:26Z

I experimented running this on a patched Graal compiler with partial Vector API support. I focused on vectorDot256 because that's the most likely to be compiled properly... I reached quite far, everything is compiled properly until the last large block with the sums where I get an exception in the compiler...
The bug seems to be in the compiler internal tracking of the vectors... not because of missing features. I believe that, with minor fixes, Graal will be able to properly compile this. I'll keep you posted.

srogmann · 2024-08-12T20:18:35Z

Did you try vectorDot256Array?

Implementation of Q6_KFloatTensor

1bfdc96

mukel mentioned this pull request Oct 23, 2024

Running as a service #17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of Q6_KFloatTensor #12

Implementation of Q6_KFloatTensor #12

srogmann commented Aug 10, 2024

mukel commented Aug 12, 2024 •

edited

Loading

srogmann commented Aug 12, 2024

Implementation of Q6_KFloatTensor #12

Are you sure you want to change the base?

Implementation of Q6_KFloatTensor #12

Conversation

srogmann commented Aug 10, 2024

mukel commented Aug 12, 2024 • edited Loading

srogmann commented Aug 12, 2024

mukel commented Aug 12, 2024 •

edited

Loading