Release 1.2.0

Major Features and Improvements

Open-source model: Gemma.
Continuous Batching experimental support.

Release 1.1.0

Major Features and Improvements

Open-source model: LLaMA 2.
Open-source improvement: GPT-J tokenizer.

Release 1.0.0

Major Features and Improvements

Open-source models: LLaMA and GPT-J.

Release 0.2.2

Bug Fixes

Improved compatibility with new Cloud TPU systems.

Release 0.2.1

Bug Fixes

Fixed multi-host TPU models.
Fixed single-host GPU models.

Release 0.2.0

Major Features and Improvements

Google Cloud GPU support.
Model quantization.
Streaming lm.generate.
A new custom model server type.
PyTorch model servers.
ACL settings on models and cells.

Release 0.1.0

Major Features and Improvements

A Pax model server that supports Google Cloud TPU slice serving.
An admin server that manages model servers.
Go, C++, and Python clients to manage and use models.
A command-line tool saxutil.
Example language and vision model serving parameters.