Faster-Outlines

Supercharge your structured text generation with faster-outlines - a high-
performance Rust backend for the Outlines library.

Overview

faster_outlines is designed to significantly boost the performance of regex-guided text generation, particularly for LLM inference servers. It's an ideal solution for scenarios where regex patterns for guiding LLM generation are not known in advance.

Key features:

🚀 Seamless one-line integration with existing Outlines projects
🚀 All the features you already love about outlines
⚡ Asynchronous FSM compilation for immediate start of LLM inference
🏎️ Substantial performance improvements, especially for complex regex patterns ( like JSON )
🔄 Continuous updates to improve speed!

Upcoming (in no particular order):

🍴 vLLM fork using faster_outlines
🤝 Official integration with vLLM's main repo (hopefully)
Redis as a caching backend, for large inference setups
🦀 Rust API. ( currently started, but unfinished )

Why faster_outlines?

Optimized for LLM Inference Servers: Ideal for scenarios where regex patterns are dynamic and not known beforehand.
Asynchronous Processing: Unlike the standard Outlines library, faster_outlines allows you to start LLM inference immediately, without waiting for the entire FSM to compile.
Significant Performance Boost: Especially noticeable with complex regex patterns and large state spaces.
Seamless Integration: Works with your existing Outlines code with minimal changes (outlines v0.0.46, soon all versions).

Installation

Warning

faster_outlines currently only supports linux based operating systems. You can try compiling on systems such as windows, but your better off using WSL2 If on a non linux system, you will need to build from source. Make sure you have Rust installed.

pip install faster_outlines

Quick Start

One line patching with outlines (v0.0.46)

Integrating faster_outlines into your project is as simple as adding one line of code:

import outlines
from faster_outlines import patch

patch(outlines)

# Now use outlines as you normally would
# Your code here...

You can also pass save_to_sys_modules=True to the patch function, in which case all normal outlines imports will use the modified / patched module.

from faster_outlines import patch
import outlines
patch(outlines)

from outline.fsm.fsm import RegexFSM # Import as usual.

A more lengthy but full example:

import outlines
from faster_outlines import patch

patch(outlines)

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2", device="cuda:0", model_kwargs={"load_in_8bit": True})

schema = '''{
    "title": "Character",
    "type": "object",
    "properties": {
        "name": {
            "title": "Name",
            "maxLength": 10,
            "type": "string"
        },
        "age": {
            "title": "Age",
            "type": "integer"
        },
        "armor": {"$ref": "#/definitions/Armor"},
        "weapon": {"$ref": "#/definitions/Weapon"},
        "strength": {
            "title": "Strength",
            "type": "integer"
        }
    },
    "required": ["name", "age", "armor", "weapon", "strength"],
    "definitions": {
        "Armor": {
            "title": "Armor",
            "description": "An enumeration.",
            "enum": ["leather", "chainmail", "plate"],
            "type": "string"
        },
        "Weapon": {
            "title": "Weapon",
            "description": "An enumeration.",
            "enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
            "type": "string"
        }
    }
}'''

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2", device="cuda:0")
print("Model loaded.")
generator = outlines.generate.json(model, schema)
character = generator("Give me a character description")
print(character)

from faster_outlines.fsm import RegexGuide, TokenVocabulary
from faster_outlines.sampling import BaseLogitsProcessor
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("NousResearch/Hermes-2-Pro-Llama-3-8B")
tokenizer = AutoTokenizer.from_pretrained("NousResearch/Hermes-2-Pro-Llama-3-8B")

vocab = TokenVocab(
    tokenizer.get_vocab(),
    tokenizer.eos_token_id,
    set(tokenizer.all_special_tokens)
)

# Regex for an Email adress
regex = r"""[a-z0-9!#$%&'*+/=?^_{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"""

guide = RegexGuide(regex, vocab)

m = """<|im_start|>user\nWrite me a funny email adress.\n<|im_end|>\n<|im_start|>assistant\n"""

inputs = tokenizer.encode(m, return_tensors="pt")

logits_processor = BaseLogitsProcessor(guide)

print(
    model.generate(
        inputs.to("cuda"),
        max_new_tokens=100,
        logits_processors=[logits_processor],
        do_sample=True
    )
)

Performance Comparison

faster-outlines's regex index compilation time is the time taken to fully compile the index, not the time until the index is usable for sampling. The time until the index is usable for sampling is normally not more than 1ms more than the time taken to compile the regex to a FSM using interegular.

The raw benchmark results are located in json at bench/benchmark_results.json, and the graph is made with bench/makePrettyGraph.js

Caching and Env vars

faster-outlines caches all generated FSMs in a Rust-based LRU Cache. The cache can be controlled using the following environment variables:

Variable	Default	Description
`FASTER_OUTLINES_CACHE_SIZE`	50	Maximum number of FSMs to cache
`FASTER_OUTLINES_DISABLE_CACHE`	false	Disable caching ("true"/"1"/"yes")

Docs

Most of the rust code is thoroughly documented in terms of data structure and methodology. The rust docs and the python binding code, aswell as the .pyi file for the compiled portion of the lib should be sufficient for most. If you have any questions which the comments and code don't aswer feel free to open an issue.

Contributing & Support

Contributions welcomed!

If you would like to support the further development and more speed improvements for faster_outlines, please consider supporting us on Github sponsors, or make a donation using the Buy-Me-A-Coffee link below!

Issues

If you have an issue with the lib, please, please open a github issue describing how to reproduce it, and we will be sure to work on fixing it.

Acknowledgments

This project builds upon the excellent work of the Outlines library.

Copyright

This work is dual licensed under apache-2.0 and MIT. find more info in the LICENSE file.

Citations:

@article{willard2023efficient,
  title={Efficient Guided Generation for LLMs},
  author={Willard, Brandon T and Louf, R{\'e}mi},
  journal={arXiv preprint arXiv:2307.09702},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
.circleci		.circleci
assets		assets
bench		bench
ci		ci
examples		examples
faster_outlines		faster_outlines
rust/faster_outlines_rs		rust/faster_outlines_rs
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
LICENSE		LICENSE
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
pyproject.toml		pyproject.toml
rust-toolchain		rust-toolchain
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Faster-Outlines

Overview

Why faster_outlines?

Installation

Quick Start

Performance Comparison

Caching and Env vars

Docs

Contributing & Support

Issues

Acknowledgments

Copyright

About

Licenses found

Releases

Packages

Languages

License

Licenses found

unaidedelf8777/faster-outlines

Folders and files

Latest commit

History

Repository files navigation

Faster-Outlines

Overview

Why faster_outlines?

Installation

Quick Start

Performance Comparison

Caching and Env vars

Docs

Contributing & Support

Issues

Acknowledgments

Copyright

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages