how to achieve the Guidance acceleration #515

xujli · 2023-12-04T08:41:12Z

xujli
Dec 4, 2023

I just want to know how do you achieve the Guidance acceleration in transformers, only cache? Could you point out the source code I'd like to appreciate it.

Answered by slundberg

Dec 7, 2023

It works by sending things in batch to the compute backend when we can. How this batching happens is specific to the backend but the if statement that detect if the next token is forced (and can so be batched) is at:

guidance/guidance/models/_model.py

Line 731 in 26554b0

if is_forced:

hope that helps!

View full answer

slundberg · 2023-12-07T18:15:35Z

slundberg
Dec 7, 2023
Maintainer

It works by sending things in batch to the compute backend when we can. How this batching happens is specific to the backend but the if statement that detect if the next token is forced (and can so be batched) is at:

guidance/guidance/models/_model.py

Line 731 in 26554b0

if is_forced:

hope that helps!

1 reply

xujli Dec 11, 2023
Author

If the next token is forced, the backend will directly return the token ID instead of get_logits, thanks for your reply. Thus, the actual acceleration is the parallel attention score calculation when we feed multiple new tokens caused by the template and avoid the logits processing for the template tokens, Am I right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to achieve the Guidance acceleration #515

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

how to achieve the Guidance acceleration #515

xujli Dec 4, 2023

Replies: 1 comment · 1 reply

slundberg Dec 7, 2023 Maintainer

xujli Dec 11, 2023 Author

xujli
Dec 4, 2023

Replies: 1 comment 1 reply

slundberg
Dec 7, 2023
Maintainer

xujli Dec 11, 2023
Author