Skip to content

Latest commit

 

History

History
98 lines (76 loc) · 2.99 KB

model_guide.md

File metadata and controls

98 lines (76 loc) · 2.99 KB

Model Settings Guide

In order to properly evaluate a given MLLM, we require implementation of a wrapper class subclassing the model.modelclass class, that defines how the streamingbench should interface with your model. This guide walks through how to write this subclass via adding it to the library!

Setup

To get started, go ahead and clone the main repo.

git clone https://github.com/Fzacker/StreamingBench.git
cd StreamingBench/src

Now, you need to create a new file where you'll be adding your model:

touch model/<your_model_filename>.py

As a rule of thumb, we recommend you to use model/Qwen2VL.py and model/LLaVAOneVision.py as reference implementations for your model. You can copy and paste the contents of one of these files into your new file to get started.

Interface

model.modelclass

All models must subclass the model.modelclass class.

The class enforces a common interface via which we can extract responses from a model:

class Model:
    def __init__(self):
        """
        Initialize the model
        """
        pass
    
    def Run(self, file, inp):
        """
        Given the file and input prompt, run the model and return the response
        file: Video file path
        inp: Input prompt
        """

        return ""

    def name():
        """
        Return the name of the model
        """
        return ""
  • __init__(self)

    • This is where you should load your model weights, tokenizer, and any other necessary components. For example:
      def __init__(self):
          model = Qwen2VLForConditionalGeneration.from_pretrained(
              "Qwen/Qwen2-VL-7B-Instruct", 
              torch_dtype="auto", 
              device_map="auto", 
              attn_implementation="flash_attention_2"
          )
      
          processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
  • Run(self, file, inp)

    • file (type: str) is the path to the video file.
    • inp (type: str) is the input prompt.
    • This is where you should implement the logic for running your model on the given input. The function should return the response from the model as a string.
  • name(self)

    • This is where you should return the name of your model. This will be used to identify your model when running the benchmark.

eval.py

You need to add your model to the eval.py file like this:

####### MODEL ############

model = Model()

if args.model_name == "GPT4o":
    from model.GPT4o import GPT4o
    model = GPT4o()
elif args.model_name == "<your_model_name>":
    from model."<your_model_filename>" import "<your_model_classname>"
    model = <your_model_classname>()
######################

eval.sh

You need to add your model to the eval.sh file like this:

# Your model
if [ "$EVAL_MODEL" = "<your_model_name>" ]; then
    conda activate your_conda_env
    CUDA_VISIBLE_DEVICES=$Devices python eval.py --model_name $EVAL_MODEL --benchmark_name $BENCHMARK --data_file $DATA_FILE --output_file $OUTPUT_FILE