Releases: triton-inference-server/pytriton
PyTriton 0.5.3
-
New: Relaxed wheel dependencies to avoid forced downgrading of
protobuf
and other packages in the NVIDIA 24.02 docker containers for PyTorch and other frameworks. -
Version of Triton Inference Server embedded in wheel: 2.43.0
PyTriton 0.5.2
-
Add: Add
TritonLifecyclePolicy
parameter to Triton class to control the lifecycle of the Triton Inference Server
(Triton Inference Server can be started at the beginning of the context - default behavior, or at the call ofrun
orserve
method),
second flag in this parameter indicates if model configs should be created in local filesystem or passed to Triton Inference Server and managed by it. -
Fix: ModelManager does not raise
tritonclient.grpc.InferenceServerException
forstop
method when HTTP endpoint is disabled in Triton configuration. -
Fix: Methods can be used as the inference callable.
-
Version of Triton Inference Server embedded in wheel: 2.42.0
PyTriton 0.5.1
-
Fix: ModelClient does not raise
gevent.exceptions.InvalidThreadUseError
when destroyed in a different thread. -
Version of Triton Inference Server embedded in wheel: 2.42.0
PyTriton 0.5.0
- New: Decoupled models support
- New: AsyncioDecoupledModelClient, which works in async frameworks and decoupled Triton models like some Large Language Models.
- Fix: Fixed a bug that prevented getting the log level when HTTP endpoint was disabled. Thanks @catwell
- Version of Triton Inference Server embedded in wheel: 2.41.0
PyTriton 0.4.2
- New: You can create a client from an existing client instance or model configuration to avoid loading model configuration from the server.
- New: Introduced warning system using the
warnings
module. - Fix: Experimental client for decoupled models prevents sending another request, when responses from previous request are not consumed, blocks close until stream is stopped.
- Fix: Leak of ModelClient during Triton creation
- Fix: Fixed non-declared project dependencies (removed from use in code or added to package dependencies)
- Fix: Remote model is being unloaded from Triton when RemoteTriton is closed.
- Version of Triton Inference Server embedded in wheel: 2.39.0
PyTriton 0.4.1
- New: Place where workspaces with temporary Triton model repositories and communication file sockets can be configured by
$PYTRITON_HOME
environment variable - Fix: Recover handling
KeyboardInterrupt
intriton.serve()
- Fix: Remove limit for handling bytes dtype tensors
- Build scripts update
- Added support for arm64 platform builds
- Version of Triton Inference Server embedded in wheel: 2.39.0
PyTriton 0.4.0
-
New: Remote Mode - PyTriton can be used to connect to a remote Triton Inference Server
- Introduced RemoteTriton class which can be used to connect to a remote Triton Inference Server running on the same machine, by passing triton url.
- Changed Triton lifecycle - now the Triton Inference Server is started while entering the context. This allows to load models dynamically to the running server while calling the bind method. It is still allowed to create Triton instance without entering the context and bind models before starting the server (in this case the models are lazy loaded when calling run or serve method like it worked before).
- In RemoteTriton class, calling enter or connect method connects to triton server, so we can safely load models while binding inference functions (if RemoteTriton is used without context manager, models are lazy loaded when calling connect or serve method).
-
Change: "batch" decorator raises a ValueError if any of the outputs have a different batch size than expected.
-
Fix: gevent resources leak in FuturesModelClient
-
Version of Triton Inference Server embedded in wheel: 2.36.0
PyTriton 0.3.1
- Fix: Addressed potential instability in shared memory management.
- Change:
KeyboardInterrupt
is now handled intriton.serve()
. PyTriton hosting scripts return an exit code of 0 instead of 130 when they receive a SIGINT signal.
- Version of Triton Inference Server embedded in wheel: 2.36.0
PyTriton 0.3.0
- new: Support for multiple Python versions starting from 3.8+
- new: Added support for decoupled models enabling to support results streaming from models (alpha state)
- change: Upgraded Triton Inference Server binaries to version 2.36.0. Note that this Triton Inference Server requires glibc 2.35+ or a more recent version.
- Version of Triton Inference Server embedded in wheel: 2.36.0
PyTriton 0.2.5
- new: Allow to execute multiple PyTriton instances in the same process and/or host
- fix: Invalid flags for Proxy Backend configuration passed to Triton
- Version of Triton Inference Server embedded in wheel: 2.33.0