Skip to content

Releases: triton-inference-server/pytriton

PyTriton 0.5.3

08 Mar 18:31
Compare
Choose a tag to compare
  • New: Relaxed wheel dependencies to avoid forced downgrading of protobuf and other packages in the NVIDIA 24.02 docker containers for PyTorch and other frameworks.

  • Version of Triton Inference Server embedded in wheel: 2.43.0

PyTriton 0.5.2

29 Feb 10:32
Compare
Choose a tag to compare
  • Add: Add TritonLifecyclePolicy parameter to Triton class to control the lifecycle of the Triton Inference Server
    (Triton Inference Server can be started at the beginning of the context - default behavior, or at the call of run or serve method),
    second flag in this parameter indicates if model configs should be created in local filesystem or passed to Triton Inference Server and managed by it.

  • Fix: ModelManager does not raise tritonclient.grpc.InferenceServerException for stop method when HTTP endpoint is disabled in Triton configuration.

  • Fix: Methods can be used as the inference callable.

  • Version of Triton Inference Server embedded in wheel: 2.42.0

PyTriton 0.5.1

09 Feb 07:38
Compare
Choose a tag to compare
  • Fix: ModelClient does not raise gevent.exceptions.InvalidThreadUseError when destroyed in a different thread.

  • Version of Triton Inference Server embedded in wheel: 2.42.0

PyTriton 0.5.0

10 Jan 09:35
Compare
Choose a tag to compare
  • New: Decoupled models support
  • New: AsyncioDecoupledModelClient, which works in async frameworks and decoupled Triton models like some Large Language Models.
  • Fix: Fixed a bug that prevented getting the log level when HTTP endpoint was disabled. Thanks @catwell

PyTriton 0.4.2

06 Dec 19:04
Compare
Choose a tag to compare
  • New: You can create a client from an existing client instance or model configuration to avoid loading model configuration from the server.
  • New: Introduced warning system using the warnings module.
  • Fix: Experimental client for decoupled models prevents sending another request, when responses from previous request are not consumed, blocks close until stream is stopped.
  • Fix: Leak of ModelClient during Triton creation
  • Fix: Fixed non-declared project dependencies (removed from use in code or added to package dependencies)
  • Fix: Remote model is being unloaded from Triton when RemoteTriton is closed.

PyTriton 0.4.1

13 Nov 20:41
Compare
Choose a tag to compare
  • New: Place where workspaces with temporary Triton model repositories and communication file sockets can be configured by $PYTRITON_HOME environment variable
  • Fix: Recover handling KeyboardInterrupt in triton.serve()
  • Fix: Remove limit for handling bytes dtype tensors
  • Build scripts update
    • Added support for arm64 platform builds

PyTriton 0.4.0

24 Oct 19:40
Compare
Choose a tag to compare
  • New: Remote Mode - PyTriton can be used to connect to a remote Triton Inference Server

    • Introduced RemoteTriton class which can be used to connect to a remote Triton Inference Server running on the same machine, by passing triton url.
    • Changed Triton lifecycle - now the Triton Inference Server is started while entering the context. This allows to load models dynamically to the running server while calling the bind method. It is still allowed to create Triton instance without entering the context and bind models before starting the server (in this case the models are lazy loaded when calling run or serve method like it worked before).
    • In RemoteTriton class, calling enter or connect method connects to triton server, so we can safely load models while binding inference functions (if RemoteTriton is used without context manager, models are lazy loaded when calling connect or serve method).
  • Change: "batch" decorator raises a ValueError if any of the outputs have a different batch size than expected.

  • Fix: gevent resources leak in FuturesModelClient

  • Version of Triton Inference Server embedded in wheel: 2.36.0

PyTriton 0.3.1

27 Sep 12:24
Compare
Choose a tag to compare
  • Fix: Addressed potential instability in shared memory management.
  • Change: KeyboardInterrupt is now handled in triton.serve(). PyTriton hosting scripts return an exit code of 0 instead of 130 when they receive a SIGINT signal.

PyTriton 0.3.0

05 Sep 12:41
Compare
Choose a tag to compare
  • new: Support for multiple Python versions starting from 3.8+
  • new: Added support for decoupled models enabling to support results streaming from models (alpha state)
  • change: Upgraded Triton Inference Server binaries to version 2.36.0. Note that this Triton Inference Server requires glibc 2.35+ or a more recent version.

PyTriton 0.2.5

24 Aug 16:22
Compare
Choose a tag to compare
  • new: Allow to execute multiple PyTriton instances in the same process and/or host
  • fix: Invalid flags for Proxy Backend configuration passed to Triton