-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Created new structure for docs folder, and added advanced usage examples tutorial
- Loading branch information
1 parent
de95c1a
commit 9c66a9f
Showing
1 changed file
with
56 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# Advanced Usage Examples | ||
LLMeBench offers support for advanced benchmarking use cases. In this tutorial, we provide example commands for such cases starting from the following general command: | ||
|
||
``` bash | ||
python -m llmebench --filter '*benchmarking_asset*' <benchmark-dir> <results-dir> | ||
``` | ||
|
||
As can be seen in the previous command, the framework performs a wildcard search over the [benchmarking assets directory](https://github.com/qcri/LLMeBench/tree/main/assets) to identify the assets(s) to run as specified by `'*benchmarking_asset*'`. This is possible becuase we roughly maintain the following strcutrue and file naming scheme in the benchmarking assets directory. | ||
> language_code/task_category/task/Dataset_Model_LearningSetup.py | ||
|
||
### Running all assets for a specific language | ||
The framework currently uses a two letter language code. It is possible to run all assets implemented for a single language using the command: | ||
|
||
```bash | ||
python -m llmebench --filter '*language_code/*' <benchmark-dir> <results-dir> | ||
``` | ||
- `language_code`: Example values: "ar"(--> Arabic), "en"(--> English), "fr"(--> French), etc. | ||
|
||
|
||
### Running all assets for a category of tasks | ||
We currently release assets under **eight** task categories as listed [here](https://github.com/qcri/LLMeBench/tree/main/assets/ar). Running assets for one category can be done as follows: | ||
```bash | ||
python -m llmebench --filter '*task_category/*' <benchmark-dir> <results-dir> | ||
``` | ||
- `task_category`: Example values: "MT"(for Machine Translation), "semantics", "sentiment_emotion_others", etc. | ||
|
||
Running the above command will run assets from all _models_, _languages_, _subtasks_, and _learning setups_ for a `task_category`. | ||
|
||
### Running all assets for a specific task | ||
As with task categories, we also maintain consistent task names across languages, learning setups, etc. To run assets for a sinlge task: | ||
|
||
```bash | ||
python -m llmebench --filter '*task/*' <benchmark-dir> <results-dir> | ||
``` | ||
- `task`: Example values: "sentiment", "SNS", "NLI", "news_categorization", etc. | ||
|
||
|
||
### Running all assets for a specific model | ||
It is possible to benchmark a single model using the following command: | ||
```bash | ||
python -m llmebench --filter '*model*' <benchmark-dir> <results-dir> | ||
``` | ||
- `model`: Example values: "GPT35", "GPT4", "BLOOMZ", etc. | ||
|
||
|
||
### Running all assets for a specific learning setup | ||
The framework currently supports both zero-shot and few-shot learning setups. To run all zero-shot assets: | ||
```bash | ||
python -m llmebench --filter '*ZeroShot*' <benchmark-dir> <results-dir> | ||
``` | ||
To run all few shot assets: | ||
|
||
```bash | ||
python -m llmebench --filter '*FewShot*' --n_shots <n> <benchmark-dir> <results-dir> | ||
``` |