Implement speculative decoding #732

wsxiaoys · 2023-11-08T22:16:04Z

Code location: https://github.com/TabbyML/tabby/blob/main/crates/llama-cpp-bindings/src/engine.cc
Reference: https://github.com/ggerganov/llama.cpp/blob/master/examples/speculative/speculative.cpp#L47

Implement speculative decoding to speed up certain models.

Squadrick · 2023-11-16T23:26:28Z

I've started working on this. I'll have a draft CL with the implementation to make sure I have the logic right. Might need some help on what the interface changes to TextInferenceEngine will look like to make this possible.

Also, right now, the decoding in entirely greedy, should I continue to use greedy decoding for the speculative model as well?

wsxiaoys · 2023-11-16T23:51:45Z

Thanks for claiming the feature!

Also, right now, the decoding in entirely greedy, should I continue to use greedy decoding for the speculative model as well?

Yes - greedy decoding shall be good for now

wsxiaoys added the enhancement New feature or request label Nov 8, 2023

wsxiaoys added this to the Tabby 0.7.0 milestone Nov 8, 2023

wsxiaoys added performance help wanted Extra attention is needed and removed enhancement New feature or request labels Nov 8, 2023

wsxiaoys added this to Tabby Nov 17, 2023

wsxiaoys moved this to Todo in Tabby Nov 18, 2023

wsxiaoys removed this from the Tabby 0.7.0 milestone Dec 10, 2023

wsxiaoys closed this as not planned Won't fix, can't repro, duplicate, stale May 30, 2024

github-project-automation bot moved this from Todo to Done in Tabby May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement speculative decoding #732

Implement speculative decoding #732

wsxiaoys commented Nov 8, 2023

Squadrick commented Nov 16, 2023

wsxiaoys commented Nov 16, 2023

Implement speculative decoding #732

Implement speculative decoding #732

Comments

wsxiaoys commented Nov 8, 2023

Squadrick commented Nov 16, 2023

wsxiaoys commented Nov 16, 2023