Merge pull request #69 from rhymes-ai/doc

docs: add a note on potential errors when enabling tensor parallelism for vLLM
rhymes-ai · Nov 13, 2024 · ac64fac · ac64fac
2 parents 318ecd3 + 89941ed
commit ac64fac
Showing 1 changed file with 6 additions and 0 deletions.
diff --git a/docs/inference.md b/docs/inference.md
@@ -83,6 +83,12 @@ pip install -e .[vllm]
 ```
 
 ### How to Use:
+
+> **NOTE:** If you encounter a "RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method" when enabling tensor parallelism, you can try setting the following environment variable:
+> ```bash
+> export VLLM_WORKER_MULTIPROC_METHOD="spawn"
+> ```
+
 ```python
 from PIL import Image
 from transformers import AutoTokenizer