runscript.help

This container provides a convenient way to run LLaVA. In addition to the LLaVA
module, it includes the commands:
  - `llava-run`, a command-line wrapper for LLaVA inference
  - `hyak-llava-web`, a wrapper to launch the gradio web interface and issue an
    SSH connection string you can copy to open a tunnel to your own computer.
 
To run LLaVA with the `llava-run` script, use the following command:
  apptainer run --nv --writable-tmpfs \
    oras://ghcr.io/uw-psych/llava-container/llava-container:latest \
    llava-run [llava-run arguments]

You must pass the "--nv" flag to enable GPU support.

Depending on your intended use, you may also want to pass the "--bind" flag
to mount a directory from the host system into the container.

To specify a directory to use for the HuggingFace model cache and enable access
to /gscratch, use the following command:
  apptainer run --nv --writable-tmpfs \
    --env HUGGINGFACE_HUB_CACHE=/path/to/cache \
    --bind /gscratch \
    oras://ghcr.io/uw-psych/llava-container/llava-container:latest \
    llava-run [llava-run arguments]


The following describes the usage of this script:

llava-run [-h] [--model-path PATH] [--model-base PATH] --image-file
	IMAGE [IMAGE ...] (--query QUERY [QUERY ...] | --chat)
	[--json]
	[--conv-mode {v0,v1,vicuna_v1,llama_2,plain,v0_plain,llava_v0,v0_mmtag,llava_v1,v1_mmtag,llava_llama_2,mpt}]
	[--stack-sep SEP] [--temperature FLOAT] [--top_p FLOAT]
	[--num_beams N] [--max_new_tokens N]
	[--load-8bit | --load-4bit] [--device {cuda,cpu}]
	[--hf-cache-dir DIR]

options:
  -h, --help            show this help message and exit
  --model-path PATH     Model path
  --model-base PATH     Model base (required for 'lora' models)
  --image-file IMAGE [IMAGE ...]
                        Path or URL to image (provide multiple to process in
                        batch; use --sep delimiter within paths to stack image
                        inputs )
  --query QUERY [QUERY ...]
                        Query (can be specified multiple times, e.g. --query a
                        --query b)
  --chat                Use chat instead of query
  --json                Produce JSON output
  --conv-mode {v0,v1,vicuna_v1,llama_2,plain,v0_plain,llava_v0,v0_mmtag,llava_v1,v1_mmtag,llava_llama_2,mpt}
                        Conversation mode
  --stack-sep SEP       Internal separator for stacked image files (default:
                        ",")
  --temperature FLOAT   Temperature (default: 0.2)
  --top_p FLOAT         Top p (default: 1.0)
  --num_beams N         Number of beams (default: 1)
  --max_new_tokens N    Max new tokens (default: 512)
  --load-8bit           Load 8bit model
  --load-4bit           Load 4bit model
  --device {cuda,cpu}   Device to use
  --hf-cache-dir DIR    HuggingFace cache directory
  
  For details on the arguments, see the LLaVA documentation and the usage infor-
  mation for llava.eval.run_llava and llava.serve.cli.