why inference with kv cache have problems? #82

objectnoppppppp · 2024-09-10T08:59:36Z

I have fintune Evo with lora and then inference with cache on, i find that the inference with cache on was not right because i find that it's right when i turn the cache off. I don't know what's wrong with the cache.

JiaLonghao1997 · 2024-12-20T03:58:39Z

I have fintune Evo with lora and then inference with cache on, i find that the inference with cache on was not right because i find that it's right when i turn the cache off. I don't know what's wrong with the cache.

I needed to generate embeddings for 127,906 sequences ranging from 1-40kb, which took 168 hours. Can you provide some suggestions for improvement, such as model distillation?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why inference with kv cache have problems? #82

why inference with kv cache have problems? #82

objectnoppppppp commented Sep 10, 2024

JiaLonghao1997 commented Dec 20, 2024

why inference with kv cache have problems? #82

why inference with kv cache have problems? #82

Comments

objectnoppppppp commented Sep 10, 2024

JiaLonghao1997 commented Dec 20, 2024