-
I just want to know how do you achieve the Guidance acceleration in transformers, only cache? Could you point out the source code I'd like to appreciate it. |
Beta Was this translation helpful? Give feedback.
Answered by
slundberg
Dec 7, 2023
Replies: 1 comment 1 reply
-
It works by sending things in batch to the compute backend when we can. How this batching happens is specific to the backend but the if statement that detect if the next token is forced (and can so be batched) is at: guidance/guidance/models/_model.py Line 731 in 26554b0 hope that helps! |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
slundberg
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It works by sending things in batch to the compute backend when we can. How this batching happens is specific to the backend but the if statement that detect if the next token is forced (and can so be batched) is at:
guidance/guidance/models/_model.py
Line 731 in 26554b0
hope that helps!