Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement speculative decoding #732

Closed
wsxiaoys opened this issue Nov 8, 2023 · 2 comments
Closed

Implement speculative decoding #732

wsxiaoys opened this issue Nov 8, 2023 · 2 comments
Labels
help wanted Extra attention is needed performance

Comments

@wsxiaoys
Copy link
Member

wsxiaoys commented Nov 8, 2023

Code location: https://github.com/TabbyML/tabby/blob/main/crates/llama-cpp-bindings/src/engine.cc
Reference: https://github.com/ggerganov/llama.cpp/blob/master/examples/speculative/speculative.cpp#L47

Implement speculative decoding to speed up certain models.

@wsxiaoys wsxiaoys added the enhancement New feature or request label Nov 8, 2023
@wsxiaoys wsxiaoys added this to the Tabby 0.7.0 milestone Nov 8, 2023
@wsxiaoys wsxiaoys added performance help wanted Extra attention is needed and removed enhancement New feature or request labels Nov 8, 2023
@Squadrick
Copy link
Contributor

I've started working on this. I'll have a draft CL with the implementation to make sure I have the logic right. Might need some help on what the interface changes to TextInferenceEngine will look like to make this possible.

Also, right now, the decoding in entirely greedy, should I continue to use greedy decoding for the speculative model as well?

@wsxiaoys
Copy link
Member Author

Thanks for claiming the feature!

Also, right now, the decoding in entirely greedy, should I continue to use greedy decoding for the speculative model as well?

Yes - greedy decoding shall be good for now

@wsxiaoys wsxiaoys added this to Tabby Nov 17, 2023
@wsxiaoys wsxiaoys moved this to Todo in Tabby Nov 18, 2023
@wsxiaoys wsxiaoys removed this from the Tabby 0.7.0 milestone Dec 10, 2023
@wsxiaoys wsxiaoys closed this as not planned Won't fix, can't repro, duplicate, stale May 30, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in Tabby May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed performance
Projects
Status: Done
Development

No branches or pull requests

2 participants