diff --git a/docs/source/models/supported_models.rst b/docs/source/models/supported_models.rst index 7a6932d65e653..3f012284bfbff 100644 --- a/docs/source/models/supported_models.rst +++ b/docs/source/models/supported_models.rst @@ -365,7 +365,7 @@ Text Embedding .. note:: Unlike base Qwen2, :code:`Alibaba-NLP/gte-Qwen2-7B-instruct` uses bi-directional attention. - You can set `--hf-overrides '{"is_causal": false}'` to change the attention mask accordingly. + You can set :code:`--hf-overrides '{"is_causal": false}'` to change the attention mask accordingly. On the other hand, its 1.5B variant (:code:`Alibaba-NLP/gte-Qwen2-1.5B-instruct`) uses causal attention despite being described otherwise on its model card. diff --git a/docs/source/serving/compatibility_matrix.rst b/docs/source/serving/compatibility_matrix.rst index a4300761d2635..fa03d2cde1486 100644 --- a/docs/source/serving/compatibility_matrix.rst +++ b/docs/source/serving/compatibility_matrix.rst @@ -393,7 +393,7 @@ Feature x Hardware - ✅ - ✅ - ✅ - - ✗ + - ? * - :abbr:`enc-dec (Encoder-Decoder Models)` - ✅ - ✅