The commands required to run serveral baselines are listed below. Some baselines are labeled (language-only) because the model only receives an EQA question
-
GPT-4 (language-only)
# requires setting the OPENAI_API_KEY environment variable python openeqa/baselines/gpt4.py --dry-run # remove --dry-run to process the full benchmark
-
LLaMA (language-only)
First, download LLaMA weights in the Hugging Face format from here. Then, run:
python openeqa/baselines/llama.py -m <path/to/hf/weights>
-
GPT-4V (vision + language)
# requires setting the OPENAI_API_KEY environment variable python openeqa/baselines/gpt4v.py --num-frames 50 --dry-run # remove --dry-run to process the full benchmark
-
Gemini Pro (language-only)
# requires setting the GOOGLE_API_KEY environment variable python openeqa/baselines/gemini-pro.py --dry-run # remove --dry-run to process the full benchmark
-
Gemini Pro Vision (vision + language)
# requires setting the GOOGLE_API_KEY environment variable python openeqa/baselines/gemini-pro-vision.py --num-frames 15 --dry-run # remove --dry-run to process the full benchmark
-
Claude 3 (vision + language)
# requires setting the ANTHROPIC_API_KEY environment variable python openeqa/baselines/claude-vision.py --num-frames 20 --dry-run # remove --dry-run to process the full benchmark