Skip to content

Commit

Permalink
Tweaks
Browse files Browse the repository at this point in the history
  • Loading branch information
pierre.delaunay authored and Delaunay committed Oct 4, 2023
1 parent 4d3c06a commit 7d16b5b
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 20 deletions.
8 changes: 5 additions & 3 deletions docs/examples/llm/client.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import subprocess

import openai


Expand Down Expand Up @@ -33,8 +35,8 @@ def get_job_comment(name="inference_server.sh"):

# profit
completion = openai.Completion.create(
model=server['model'],
prompt=args.prompt
model=server['model'],
prompt="What is the square root of 25 ?"
)

print(completion)
print(completion)
22 changes: 13 additions & 9 deletions docs/examples/llm/inference_server.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,22 @@
#SBATCH --ntasks-per-node=1
#SBATCH --mem=32G

usage() {
echo "Usage: $0 [-m] [-p]
function usage() {
echo "Usage: $0 [-m] [-p]"
echo " -h Display this help message."
echo " -m MODEL Specify a file to process."
echo " -p PATH Specify a directory to work in."
echo " -e ENV Specify the conda environementt to use."
echo " ARGUMENT Any additional argument you want to process."
exit 1
}

MODEL=""
PATH=""
MODEL_PATH=""
ENV="./env"


while getopts ":hf:d:" opt; do
while getopts ":hm:p:e:" opt; do
case $opt in
h)
usage
Expand All @@ -39,7 +40,7 @@ while getopts ":hf:d:" opt; do
MODEL="$OPTARG"
;;
p)
PATH="$OPTARG"
MODEL_PATH="$OPTARG"
;;
e)
ENV="$OPTARG"
Expand All @@ -55,22 +56,25 @@ while getopts ":hf:d:" opt; do
esac
done

echo "model: $MODEL"
echo " path: $MODEL_PATH"
echo " env: $ENV"

export MILA_WEIGHTS="/network/weights/"
cd $SLURM_TMPDIR

#
# Fix problem with conda saying it is not "init properly"
#
CONDA_EXEC="$(which conda)"
CONDA_BASE=$(dirname $CONDA_EXEC)
CONDA_ENVS="$CONDA_BASE/../envs"
source $CONDA_BASE/../etc/profile.d/conda.sh

#
# Create a new environment
#
if [ ! -d "$ENV" ]; then
if [ ! -d "$ENV" ] && [ "$ENV" != "base" ] && [ ! -d "$CONDA_ENVS/$ENV" ]; then
conda create --prefix $ENV python=3.9 -y
fi
conda activate $ENV
Expand All @@ -85,12 +89,12 @@ NAME="$WEIGHTS/$MODEL"
#
scontrol update job $SLURM_JOB_ID comment="model=$MODEL|host=$HOST|port=$PORT|shared=y"

#
#
# Launch Server
#
python -m vllm.entrypoints.openai.api_server \
--host $HOST \
--port $PORT \
--model "$MODEL" \
--model "$MODEL_PATH" \
--tensor-parallel-size $SLURM_NTASKS_PER_NODE \
--served-model-name "$MODEL"
20 changes: 12 additions & 8 deletions docs/examples/llm/vllm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Server
It is very easy to setup and supports a wide range of models through Huggingfaces.


.. code-block::
.. code-block::
# sbatch inference_server.sh -m MODEL_NAME -p WEIGHT_PATH -e CONDA_ENV_NAME_TO_USE
sbatch inference_server.sh -m Llama-2-7b-chat-hf -p /network/weights/llama.var/llama2/Llama-2-7b-chat-hf -e base
Expand All @@ -19,7 +19,7 @@ By default the script will launch the server on an rtx8000 for 15 minutes.
You can override the defaults by specifying arguments to sbatch.


.. code-block::
.. code-block::
sbatch --time=00:30:00 inference_server.sh -m Llama-2-7b-chat-hf -p /network/weights/llama.var/llama2/Llama-2-7b-chat-hf -e base
Expand All @@ -36,24 +36,28 @@ You can override the defaults by specifying arguments to sbatch.
Client
------

Becasue vLLM replicates OpenAI's API, the client side is quite straight forward.
Own OpenAI's client can be reused.
Because vLLM replicates OpenAI's API, the client side is quite straight forward and
own OpenAI's client can be reused.

.. warning::

The server takes a while to setup you might to have to wait a few minutes
before the server is ready for inference.

You can check the job log of the server.
Look for
You can check the job log of the server using ``tail -f slurm-<JOB-ID>.out`` to
see the log as it is written.

Look for ``Uvicorn running on http://... (Press CTRL+C to quit)``
to know when the server is ready to receive requests.


.. note::

We use squeue to look for the inference server job to configure the
We use ``squeue`` to look for the inference server job to configure the
url endpoint automatically.

Make sure your job name is unique!


.. literalinclude:: client.py
:language: python

0 comments on commit 7d16b5b

Please sign in to comment.