Changelog, know issues updated

triton-inference-server · Jun 14, 2024 · 8f4b033 · 8f4b033
1 parent 440d631
commit 8f4b033
Show file tree

Hide file tree

Showing 3 changed files with 11 additions and 1 deletion.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -16,14 +16,19 @@ limitations under the License.
 
 # Changelog
 
-## unreleased
+## 0.5.6 (2024-06-17)
 - New: Add PyTriton Check Tool to perform preliminary checks on the environment where PyTriton is deployed.
 - Change: limited the `tritonclient` pacakge extras to http and grpc only
 - Fix: Pin grpc-tools version to handle grpc issue in tritonclient
 - Build scripts update
   - upgrade cmake version during build
   - automatically configure wheel name based on `glibc` version
 
+[//]: <> (put here on external component update with short summary what change or link to changelog)
+
+- Version of [Triton Inference Server](https://github.com/triton-inference-server/) embedded in wheel: [2.44.0](https://github.com/triton-inference-server/server/releases/tag/v2.44.0)
+
+
 ## 0.5.5 (2024-04-15)
 
 - Fix: Performance improvements

diff --git a/docs/known_issues.md b/docs/known_issues.md
@@ -22,3 +22,4 @@ limitations under the License.
 - Enabling verbose logging may cause a significant performance drop in model inference.
 - GRPC ModelClient doesn't support timeouts for model configuration and model metadata requests due to a limitation in the underlying tritonclient library.
 - HTTP ModelClient may not respect the specified timeouts for model initialization and inference requests, especially when they are smaller than 1 second, resulting in longer waiting times. This issue is related to the underlying implementation of HTTP protocol.
+- HuggingFace BERT JAX Model works only with containers 24.04 and newer due to the usage of new version of CUDA.
diff --git a/examples/huggingface_bert_jax/README.md b/examples/huggingface_bert_jax/README.md
@@ -16,6 +16,10 @@ limitations under the License.
 
 # HuggingFace BERT JAX Model
 
+## Known issue
+
+HuggingFace BERT JAX Model works only with containers 24.04 and newer due to the usage of new version of CUDA. Please make sure that you are using the correct container version.
+
 ## Overview
 
 The example presents a HuggingFace BERT JAX model inference.