Usage of Vector API in Jlama #19
anbusampath
started this conversation in
General
Replies: 1 comment 1 reply
-
Hi, LLMs use many matrix multiplications. In fact it's where 90% of the processing time goes when running inference. You can run a matrix multiplication on a CPU with plain old java loops. Or you can run them using Vector api to do it faster. You can also run it on a GPU. You can see two implementations in Jlama. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am new to LLMs, My understanding is that Java's Vector API used for SIMD instruction to auto-vectorization for different machine architecture. Where does SIMD used in Jlama? Because LLM uses embedding models, to communicate(input/output) with LLM why do we need to vectors.
Beta Was this translation helpful? Give feedback.
All reactions