Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache input_ids on CPU memory. #128

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pmfirestone
Copy link

_get_partial_codes now caches input_ids greedily whenever it's given new ones. The caller is responsible for maintaining the validity of the cache. In particular, is_valid has to remove the last input_id from the cache where the last input_id isn't valid.

When benchmarking on the following settings, we see an 80% reduction in the time spent in syncode.

  • model: "microsoft/phi-2"
  • Grammar: "python"
  • mode: "grammar_mask"
  • parser: "lr"
  • prompt: '''def print_prime(n):
    """
    Print all primes between 1 and n
    """
    '''
  • max_new_tokens: 50.
  • Using the huggingface LogitsProcessor API.
  • Computer is a Vultr A16, 1 GPU.

_get_partial_codes now caches input_ids greedily whenever it's given new
ones. The caller is responsible for maintaining the validity of the cache. In
particular, is_valid has to remove the last input_id from the cache where the
last input_id isn't valid.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant