This guide covers how to add support for any chat-trained model to clembench. Models can be run locally, meaning on the machine that the clembench benchmark is run directly (like currently implemented for Huggingface models), or accessed via remote API (like currently implemented for OpenAI and other proprietary models).
Overview
Adding a new backend
Adding a model to the model registry
Test the added model
To add support for a new model, go through the following steps:
- Check if there is an already implemented backend that can handle the model. Every model needs to have a
backend that holds the required inference or remote API code. One backend can handle any number of similar models,
using the same remote API request or inference code, requiring only a small amount of additional data. Already
implemented backends can be found in the
backends
directory, with file names ending in_api.py
. Supported models and the corresponding backends are listed in the model registry,backends/model_registry.json
. See the model registry readme for more information on the model registry. - If there is no implemented backend that supports your model, you have to implement one. See Adding a backend.
- If there is an implemented backend that supports your model, you need to add a new model entry to the model registry. See Adding a model to the model registry.
- Test your model (and backend, if you implemented a new one) by running
hellogame
. See Testing the added model.
The backend is responsible for calling local or remote models (via an API).
- Add a file that ends in
_api.py
in the backends directory e.g.mybackend_api.py
- Implement in that file your backend class which needs to extend
backends.Backend
e.g.class MyBackend(backends.Backend)
- (Optional) Add an entry for your backend in the
key.json
The framework will automatically look into the backends folder for all files that end in _api.py
and at the model
registry to make models available for benchmarking.
Important: All backends must return a prompt, response, response_text
tuple which must be exactly this:
prompt
is the exact object that was passed to the LLM (if the object has more structure, keep it as is, do not return only the message string)response
is the exact object that was returns by the LLM (again, do not change this object in any way)response_text
is only the message generated by the LLM as a string
The first two should get logged into the requests.json
file generated by the game master and should be used for
inspection that the actual inputs and outputs are correct.
Adding a model to the registry can be as simple as adding an entry with the model's name and the backend that handles it,
but the model entry can hold more data to be used by a backend.
For example, to add support for a new OpenAI model available via the OpenAI API, adding a simple entry like this is
enough:
{
"model_name": "GPT-5-Einstein",
"model_id": "GPT-5-Einstein",
"backend": "openai"
}
Given the hypothetical new model is named GPT-5-Einstein
, and referred to with that string for the API request.
Add the entry to backends/model_registry.json
, making sure that it is properly separated by a comma and inside the
JSON list.
Important: Order of the entries in the model registry does matter! Models can be accessed by incomplete
specifications (the data of which is contained in the model entries), and the first model entry that matches the partial
specification will be used to load/access the model if there are multiple available implementations.
This section explains how to add a LLM hosted on the HuggingFace (HF) model repository to the model registry to make it
available for the local HuggingFace backend of clembench. Due to the variety of models available via Huggingface, model registry entries for these models can hold an
extensive amount of additional data used by the backend for inference.
Each model hosted on HuggingFace is identified by its model ID, which is the combination of
the model uploader's username and the individual model name.
For example: For the OpenChat 3.5 model, the model ID is openchat/openchat_3.5
, as openchat
is the uploader's user name and
openchat_3.5
is the model name.
This model ID is all that is needed to access ungated models hosted on HuggingFace.
Accessing gated models, like Meta's Llama2, requires an HF API access key/token. HF API tokens are acquired via your
user profile on the HF website. Make sure that the HF account used to acquire the access key has been granted access to
the gated model you want to add. This API key needs to be added to key.json
in the clembench root directory to be available for loading gated model data.
You should thoroughly read the model card for the model to be added to be informed about individual aspects. It's also a good idea to look at the community tab of a model repository to see if there are common issues with the model.
The clembench HuggingFace local backend relies on the transformers
and indirectly on the tokenizers
libraries for
model-dependent input tokenization. It also relies on the chat template utility of the libraries' tokenizer classes.
This first step is to make sure that a candidate model hosted on HuggingFace has the required configuration to be used
with the clembench backend.
To perform a preliminary check for compatibility, run python3 backends/initial_hf_check.py -m <MODEL ID>
.
For example: python3 backends/initial_hf_check.py -m openchat/openchat_3.5
to check the OpenChat 3.5 model.
The initial_hf_check.py
script will show the applied template and warn about common issues, but does not cover all
edge cases. It also takes the flags -i
to show the tokenizer's information and -t
to show the configured chat
template in jinja string format, which can be useful for modification into a custom template for the model.
The initial check script applies the same preprocessing as the backend.
Open backends/hf_local_models.json
in your editor of choice. This file contains entries for all models supported by
the huggingface-local backend. To make a new model available, an entry for it needs to be added to this registry.
A minimal model entry contains the model name, the backend to handle it, its HF ID, a bool that determines if a premade chat template for it will be loaded from HF and the EOS string to be culled from its outputs:
{
"model_name": "Mistral-7B-Instruct-v0.1",
"backend": "huggingface_local",
"huggingface_id": "mistralai/Mistral-7B-Instruct-v0.1",
"premade_chat_template": true,
"eos_to_cull": "</s>"
}
If the model to be added passed the initial check without any issue, use "premade_chat_template": true
in its registry
entry. This indicates that the model's tokenizer properly applies a chat template that works without any further editing.
If it does not pass the check or otherwise requires chat template changes, the entry must contain
"premade_chat_template": false
and include the custom chat template to be used in jinja2 string format.
For example:
{
"model_name": "sheep-duck-llama-2-70b-v1.1",
"backend": "huggingface_local",
"huggingface_id": "Riiid/sheep-duck-llama-2-70b-v1.1",
"premade_chat_template": false,
"custom_chat_template": "{% for message in messages %}{% if message['role'] == 'user' %}{{ '### User:\\n' + message['content'] + '\\n\\n' }}{% elif message['role'] == 'system' %}{{ '### System:\\n' + message['content'] + '\\n\\n' }}{% elif message['role'] == 'assistant' %}{{ '### Assistant:\\n' + message['content'] + '\\n\\n' }}{% endif %}{% if loop.last %}{{ '### Assistant:\\n' }}{% endif %}{% endfor %}",
"eos_to_cull": "</s>"
}
If the model requires the use of the 'slow' tokenizer class, which should be noted on the model card, the model entry
must contain "slow_tokenizer": true
.
For example:
{
"model_name": "SUS-Chat-34B",
"backend": "huggingface_local",
"huggingface_id": "SUSTech/SUS-Chat-34B",
"premade_chat_template": false,
"custom_chat_template": "{% for message in messages %}{% if message['role'] == 'user' %}{{ '### Human: ' + message['content'] + '\\n\\n' }}{% elif message['role'] == 'assistant' %}{{ '### Assistant: ' + message['content'] }}{% endif %}{% if loop.last %}{{ '### Assistant: ' }}{% endif %}{% endfor %}",
"slow_tokenizer": true,
"eos_to_cull": "<|endoftext|>"
}
The model to be added might use an uncommon tokenizer, which can lead to discrepancies between prompt and decoded model
output, requiring the model output to be split to be properly handled by clembench. In this case, the string that
predeces the model output proper needs to be contained in the model entry. (This will likely be found in testing the
model.)
For example:
{
"model_name": "Yi-34B-Chat",
"backend": "huggingface_local",
"huggingface_id": "01-ai/Yi-34B-Chat",
"premade_chat_template": false,
"custom_chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = true %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
"slow_tokenizer": true,
"output_split_prefix": "assistant\n",
"eos_to_cull": "<|im_end|>"
}
If the model to be added is gated, the model entry must contain "requires_api_key": true
. Make sure that key.json
exists and has a viable HF API access key when the model is to be used.
For example:
{
"model_name": "llama-2-7b-hf",
"backend": "huggingface_local",
"requires_api_key": true,
"huggingface_id": "meta-llama/llama-2-7b-hf",
"premade_chat_template": true,
"eos_to_cull": "</s>"
}
See the model registry readme for more information on the model registry.
Run clembench with the hellogame
clemgame. See the corresponding documentation for HowTo.
This produces interactions and requests files in JSON format in the results
directory. Specific files can be
found in results/<MODEL NAME>/hellogame/0_greet_en/
episode subdirectories.
The requests file of each episode contains the prompts given to the model and its outputs.
Check the modified_prompt_object
values for proper application of the chat template.
Then check if there is generated text and if the model outputs match the modified_prompt_object
before the generated
text.
Finally, check if the model output ends with a EOS string. This string needs to be culled, as noted above, and proper
culling is checked in the next step.
The interactions files contain processed outputs in the form they are relevant to clembench.
Model replies in the interaction files should not contain any model-specific EOS token strings.
Check if the model replies end in an EOS string. If they do, add this exact string to the EOS culling in the backend
code as shown above.
If you made any changes to the code after the first test, run the test again and check the files to make sure that they now have proper contents.
If you have successfully run the tests above, open a pull request for the clembench repository.
You can also run the benchmark with your added model if you have the necessary hardware available - if you do, please
share the results by contributing them to the clembench-runs repository.