Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why inference with kv cache have problems? #82

Open
objectnoppppppp opened this issue Sep 10, 2024 · 1 comment
Open

why inference with kv cache have problems? #82

objectnoppppppp opened this issue Sep 10, 2024 · 1 comment

Comments

@objectnoppppppp
Copy link

I have fintune Evo with lora and then inference with cache on, i find that the inference with cache on was not right because i find that it's right when i turn the cache off. I don't know what's wrong with the cache.

@JiaLonghao1997
Copy link

I have fintune Evo with lora and then inference with cache on, i find that the inference with cache on was not right because i find that it's right when i turn the cache off. I don't know what's wrong with the cache.

I needed to generate embeddings for 127,906 sequences ranging from 1-40kb, which took 168 hours. Can you provide some suggestions for improvement, such as model distillation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants