JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.

About

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

JetStream Engine Implementation

Currently, there are two reference engine implementations available -- one for Jax models and another for Pytorch models.

Jax

Git: https://github.com/google/maxtext
README: https://github.com/google/JetStream/blob/main/docs/online-inference-with-maxtext-engine.md

Pytorch

Git: https://github.com/google/jetstream-pytorch
README: https://github.com/google/jetstream-pytorch/blob/main/README.md

Documentation

Online Inference with MaxText on v5e Cloud TPU VM [README]
Online Inference with Pytorch on v5e Cloud TPU VM [README]
Serve Gemma using TPUs on GKE with JetStream
Benchmark JetStream Server
Observability in JetStream Server
Profiling in JetStream Server
JetStream Standalone Local Setup

JetStream Standalone Local Setup

Getting Started

Setup

make install-deps

Run local server & Testing

Use the following commands to run a server locally:

# Start a server
python -m jetstream.core.implementations.mock.server

# Test local mock server
python -m jetstream.tools.requester

# Load test local mock server
python -m jetstream.tools.load_tester

Test core modules

# Test JetStream core orchestrator
python -m unittest -v jetstream.tests.core.test_orchestrator

# Test JetStream core server library
python -m unittest -v jetstream.tests.core.test_server

# Test mock JetStream engine implementation
python -m unittest -v jetstream.tests.engine.test_mock_engine

# Test mock JetStream token utils
python -m unittest -v jetstream.tests.engine.test_token_utils
python -m unittest -v jetstream.tests.engine.test_utils

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.

About

JetStream Engine Implementation

Jax

Pytorch

Documentation

JetStream Standalone Local Setup

Getting Started

Setup

Run local server & Testing

Test core modules

Files

README.md

Latest commit

History

README.md

File metadata and controls

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.

About

JetStream Engine Implementation

Jax

Pytorch

Documentation

JetStream Standalone Local Setup

Getting Started

Setup

Run local server & Testing

Test core modules