-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement stochastic speculative sampling #5625
Merged
ggerganov
merged 17 commits into
ggerganov:master
from
mscheong01:stochastic-speculative-decoding
Mar 4, 2024
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
c1bad4a
(WIP) Implement stochastic speculative decoding
mscheong01 a9335a5
sample from residual distribution on draft accept failure
mscheong01 4694edd
fix #5657: force greedy sampling with probs when temp is 0
mscheong01 fb18827
remove p_accept parameter
mscheong01 34b942a
fix style
mscheong01 875319b
remove unused variables
mscheong01 6afc1f6
add srand() in speculative.cpp
mscheong01 94f6256
replace use of rand() with mt19937 sampling
mscheong01 e4896e7
fixes based on review (@JohannesGaessler)
mscheong01 6b35c8b
fix r random generation
mscheong01 2ad3f7c
randomly select next sequence to verify + fix bug in memory freeing
mscheong01 c2cd292
fix bug in active_seqs sync
mscheong01 7463569
fix uniform int distribution initialization
mscheong01 c761354
remove warnings from comparison between int and size_t
mscheong01 45465b2
check grammar in `llama_sample_probability_distribution_impl`
mscheong01 67ad517
remove malloc code by utilizing vectors
mscheong01 056bdb3
add PR link to README
mscheong01 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some way to reuse the code from
llama_sampling_sample
and avoid the duplication? Also, this function does not take into account the grammar - is this correct?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A solution would be to have
llama_sampling_sample
utilizellama_sample_probability_distribution
internally. However, this approach appeared too intrusive to the existing codebase at the time, which led to duplicating the code instead.+
Yes it seems like I should apply grammar constraints here