Skip to content

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

License

Notifications You must be signed in to change notification settings

wangkuiyi/JetStream

This branch is 2 commits ahead of, 11 commits behind AI-Hypercomputer/JetStream:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

f32ae27 · Dec 10, 2024
Aug 7, 2024
Nov 8, 2024
Aug 23, 2024
Dec 10, 2024
May 10, 2024
Mar 1, 2024
Mar 1, 2024
Mar 1, 2024
Aug 12, 2024
Nov 8, 2024
Aug 12, 2024
Aug 7, 2024
Nov 8, 2024
Aug 12, 2024
Aug 12, 2024

Repository files navigation

Unit Tests PyPI version PyPi downloads Contributions welcome

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.

About

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

JetStream Engine Implementation

Currently, there are two reference engine implementations available -- one for Jax models and another for Pytorch models.

Jax

Pytorch

Documentation

JetStream Standalone Local Setup

Getting Started

Setup

make install-deps

Run local server & Testing

Use the following commands to run a server locally:

# Start a server
python -m jetstream.core.implementations.mock.server

# Test local mock server
python -m jetstream.tools.requester

# Load test local mock server
python -m jetstream.tools.load_tester

Test core modules

# Test JetStream core orchestrator
python -m unittest -v jetstream.tests.core.test_orchestrator

# Test JetStream core server library
python -m unittest -v jetstream.tests.core.test_server

# Test mock JetStream engine implementation
python -m unittest -v jetstream.tests.engine.test_mock_engine

# Test mock JetStream token utils
python -m unittest -v jetstream.tests.engine.test_token_utils
python -m unittest -v jetstream.tests.engine.test_utils

About

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 95.4%
  • Shell 3.2%
  • Other 1.4%