Replies: 3 comments 1 reply
-
Hi Erik! Which model did you use? I can take a look and let you know. I know others have had success with this. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your reply. |
Beta Was this translation helpful? Give feedback.
-
Hi, Yes there does seem to be an issue with the tokenizer here. I tested some other models and I'll open an issue for the Qwen and Llama |
Beta Was this translation helpful? Give feedback.
-
I use the test case from "jlama-tests"
@test
@order(1)
public void LlamaRun() throws Exception {
var dir = appContext.nodeContext().nodePath().commonsLlm();
try (WeightLoader weights = SafeTensorSupport.loadWeights(dir.toFile())) {
LlamaTokenizer tokenizer = new LlamaTokenizer(dir);
Config c = om.readValue(dir.resolve("config.json").toFile(), LlamaConfig.class);
LlamaModel model = new LlamaModel(c, weights, tokenizer, DType.F32, DType.F32,
Optional.empty());
3:50:04.938 [main] DEBUG com.github.tjake.jlama.safetensors.SafeTensorIndex -- Adding split 0-2008209408 with 451 tensors of 451
23:50:05.954 [main] INFO com.github.tjake.jlama.model.AbstractModel -- Model type = Q4, Working memory type = F32, Quantized memory type = F32
23:50:06.037 [main] INFO com.x.app.ai.jlama.JlamaLlamaTest -- First prompt
PromptContext{prompt='<|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024
<|eot_id|><|start_header_id|>user<|end_header_id|>
您能讲中文吗<|eot_id|><|start_header_id|>assistant<|end_header_id|>
', optionalTools=Optional.empty}
23:50:06.048 [main] DEBUG com.github.tjake.jlama.tensor.KvBufferCache -- Optimal page size: 28 layers, 36 context length, 8257536 bytes, 1 layer pages, 3641 length pages
23:50:06.049 [main] DEBUG com.github.tjake.jlama.model.AbstractModel -- Starting at token 0 for session 1c54efb0-8879-4604-b9f3-03ba378a6052 with prompt <|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024
<|eot_id|><|start_header_id|>user<|end_header_id|>
您能讲中文吗<|eot_id|><|start_header_id|>assistant<|end_header_id|>
23:50:06.167 [main] DEBUG com.github.tjake.jlama.tensor.operations.util.JarSupport -- Loaded jlama-native library: /tmp/jlama13277383875011588717/libjlama.so
23:50:06.191 [main] DEBUG com.github.tjake.jlama.util.MachineSpec -- Machine Vector Spec: AVX_256
23:50:06.191 [main] DEBUG com.github.tjake.jlama.util.MachineSpec -- Byte Order: LITTLE_ENDIAN
23:50:06.191 [main] INFO com.github.tjake.jlama.tensor.operations.TensorOperationsProvider -- Using Native SIMD Operations (OffHeap)
23:50:08.263 [main] DEBUG com.github.tjake.jlama.model.AbstractModel -- 34 prompt tokens in 2210ms | 65.0ms per token
23:50:15.437 [main] DEBUG com.github.tjake.jlama.model.AbstractModel --
elapsed: 9s, prompt 65.0ms per token, gen 152.0ms per token
23:50:15.440 [main] INFO com.x.app.ai.jlama.JlamaLlamaTest -- Response: You didn't provide any information or context for me to understand what you'd like to talk about. If you have a specific question or topic in mind, feel free to share it, and I'll do my best to assist you!
please help me!
Beta Was this translation helpful? Give feedback.
All reactions