Releases: triton-inference-server/pytriton
Releases · triton-inference-server/pytriton
PyTriton 0.2.4
- new: Introduced
strict
flag inTriton.bind
which enables data types and shapes validation of inference callable outputs
against model config - new:
AsyncioModelClient
which works in FastAPI and other async frameworks - fix:
FuturesModelClient
do not raisegevent.exceptions.InvalidThreadUseError
- fix: Do not throw TimeoutError if could not connect to server during model verification
- Version of Triton Inference Server embedded in wheel: 2.33.0
PyTriton 0.2.3
- Improved verification of Proxy Backend environment when running under same Python interpreter
- Fixed pytriton.version to represent currently installed version
- Version of Triton Inference Server embedded in wheel: 2.33.0
PyTriton 0.2.2
- Added
inference_timeout_s
parameters to client classes - Renamed
PyTritonClientUrlParseError
toPyTritonClientInvalidUrlError
ModelClient
andFuturesModelClient
methods raisePyTritonClientClosedError
when used after client is closed- Pinned tritonclient dependency due to issues with tritonclient >= 2.34 on systems with glibc version lower than 2.34
- Added warning after Triton Server setup and teardown while using too verbose logging level as it may cause a significant performance drop in model inference
- Version of Triton Inference Server embedded in wheel: 2.33.0
PyTriton 0.2.1
- Fixed handling
TritonConfig.cache_directory
option - the directory was always overwritten with the default value. - Fixed tritonclient dependency - PyTriton need tritonclient supporting http headers and parameters
- Improved shared memory usage to match 64MB limit (default value for Docker, Kubernetes) reducing the initial size for PyTriton Proxy Backend.
- Version of Triton Inference Server embedded in wheel: 2.33.0
PyTriton 0.2.0
-
Added support for using custom HTTP/gRPC request headers and parameters.
See docs/custom_params.md for further informationThis change breaks backward compatibility of the inference function signature.
The undecorated inference function now accepts a list ofRequest
instances instead
of a list of dictionaries. TheRequest
class contains data for inputs and parameters
for combined parameters and headers. For details see docs/infrence_callable.md. -
Added
FuturesModelClient
which enables sending inference requests in a parallel manner. -
Added displaying documentation link after models are loaded.
-
Version of Triton Inference Server embedded in wheel: 2.33.0
PyTriton 0.1.5
- Improved
pytriton.decorators.group_by_values
function- Modified the function to avoid calling the inference callable on each individual sample when grouping by string/bytes input
- Added
pad_fn
argument for easy padding and combining of the inference results
- Fixed Triton binaries search
- Improved Workspace management (remove workspace on shutdown)
- Version of external components used during testing:
- Triton Inference Server: 2.29.0
- Other component versions depend on the used framework and Triton Inference Server containers versions.
Refer to its support matrix
for a detailed summary.
PyTriton v0.1.4
0.1.4 (2023-03-16)
- Add validation of the model name passed to Triton bind method.
- Add monkey patching of
InferenceServerClient.__del__
method to prevent unhandled exceptions.
- Version of external components used during testing:
- Triton Inference Server: 2.29.0
- Other component versions depend on the used framework and Triton Inference Server containers versions.
Refer to its support matrix
for a detailed summary.
PyTriton v0.1.3
0.1.3 (2023-02-20)
- Fixed getting model config in
fill_optionals
decorator.
- Version of external components used during testing:
- Triton Inference Server: 2.29.0
- Other component versions depend on the used framework and Triton Inference Server containers versions.
Refer to its support matrix
for a detailed summary.
PyTriton v0.1.2
- Fixed wheel build to support installations on operating systems with glibc version 2.31 or higher.
- Updated the documentation on custom builds of the package.
- Change: TritonContext instance is shared across bound models and contains model_configs dictionary.
- Fixed support of binding multiple models that uses methods of the same class.
- Version of external components used during testing:
- Triton Inference Server: 2.29.0
- Other component versions depend on the used framework and Triton Inference Server containers versions.
Refer to its support matrix
for a detailed summary.
PyTriton v0.1.1
- Change: The
@first_value
decorator has been updated with new features:- Renamed from
@first_values
to@first_value
- Added a
strict
flag to toggle the checking of equality of values on a single selected input of the request. Default is True - Added a
squeeze_single_values
flag to toggle the squeezing of single value ND arrays to scalars. Default is True
- Renamed from
- Fix:
@fill_optionals
now supports non-batching models - Fix:
@first_value
fixed to work with optional inputs - Fix:
@group_by_values
fixed to work with string inputs - Fix:
@group_by_values
fixed to work per sample-wise
- Version of external components used during testing:
- Triton Inference Server: 2.29.0
- Other component versions depend on the used framework and Triton Inference Server containers versions.
Refer to its support matrix
for a detailed summary.