update github readme too

rubra-ai · Jul 5, 2024 · 5790f61 · 5790f61
1 parent 44c0130
commit 5790f61
Show file tree

Hide file tree

Showing 2 changed files with 6 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -29,10 +29,13 @@ Try out the models immediately without downloading anything in Our [Huggingface
 
 ## Run Rubra Models Locally
 
+Check out our [documentation](https://docs.rubra.ai/category/serving--inferencing) to learn how to run Rubra models locally.
 We extend the following inferencing tools to run Rubra models in an OpenAI-compatible tool-calling format for local use:
 
-- [llama.cpp](https://github.com/ggerganov/llama.cpp)
-- [vllm](https://github.com/vllm-project/vllm)
+- [llama.cpp](https://github.com/rubra-ai/tools.cpp)
+- [vLLM](https://github.com/rubra-ai/vllm)
+
+**Note**: It is a known issue that Llama3 models (including 8B and 70B) are more prone to damage from quantization. We recommend serving them with either vLLM or using the fp16 quantization.
 
 ## Benchmark
 

diff --git a/docs/docs/README.md b/docs/docs/README.md
@@ -36,13 +36,12 @@ Try out the models immediately without downloading anything in [Huggingface Spac
 
 ## Run Rubra Models Locally
 
-Check out our [documentation](https://docs.rubra.ai/category/serving--inferencing) to learn how to run Rubra models locally.
 We extend the following inferencing tools to run Rubra models in an OpenAI-compatible tool-calling format for local use:
 
 - [llama.cpp](https://github.com/rubra-ai/tools.cpp)
 - [vLLM](https://github.com/rubra-ai/vllm)
 
-Note: It is a known issue that Llama3 models (including 8B and 70B) are more prone to damage from quantization. We recommend serving them with either vLLM or using the fp16 quantization.
+**Note**: It is a known issue that Llama3 models (including 8B and 70B) are more prone to damage from quantization. We recommend serving them with either vLLM or using the fp16 quantization.
 
 ## Contributing