Skip to content

Latest commit

 

History

History
51 lines (33 loc) · 1.07 KB

RELEASE.md

File metadata and controls

51 lines (33 loc) · 1.07 KB

Release 1.2.0

Major Features and Improvements

  • Open-source model: Gemma.
  • Continuous Batching experimental support.

Release 1.1.0

Major Features and Improvements

  • Open-source model: LLaMA 2.
  • Open-source improvement: GPT-J tokenizer.

Release 1.0.0

Major Features and Improvements

  • Open-source models: LLaMA and GPT-J.

Release 0.2.2

Bug Fixes

  • Improved compatibility with new Cloud TPU systems.

Release 0.2.1

Bug Fixes

  • Fixed multi-host TPU models.
  • Fixed single-host GPU models.

Release 0.2.0

Major Features and Improvements

  • Google Cloud GPU support.
  • Model quantization.
  • Streaming lm.generate.
  • A new custom model server type.
  • PyTorch model servers.
  • ACL settings on models and cells.

Release 0.1.0

Major Features and Improvements

  • A Pax model server that supports Google Cloud TPU slice serving.
  • An admin server that manages model servers.
  • Go, C++, and Python clients to manage and use models.
  • A command-line tool saxutil.
  • Example language and vision model serving parameters.