Interactions between Guidance and the tokenization #631

vivien000 · 2024-02-15T12:57:42Z

vivien000
Feb 15, 2024

Hi! I have a technical question to better understand how Guidance works. Let's imagine that we are interested in generating a string satisfying a certain constraint. Let S be the set of strings satisfying this constraint.

Are the token sequences potentially generated by Guidance exactly:

those that can be decoded as a string of S (ie. the preimage of S through the tokenizer's decoding function);
those are the image of a string of S through the tokenization
(ie. the image of S through the tokenizer's encoding function) ?

For example, if a regex is "^[a-z]{10}$", can Guidance return:

any token sequence whose total number of letters is 10;
only those that actually correspond to the tokenization of an a-z string of length 10?

I understand the answer is 1) but I would appreciate if this can be confirmed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interactions between Guidance and the tokenization #631

{{title}}

Replies: 0 comments

Select a reply

Interactions between Guidance and the tokenization #631

vivien000 Feb 15, 2024

Replies: 0 comments

vivien000
Feb 15, 2024