Skip to content

Commit

Permalink
Create advanced_usage_examples.md
Browse files Browse the repository at this point in the history
Created new structure for docs folder, and added advanced usage examples tutorial
  • Loading branch information
MaramHasanain authored Sep 14, 2023
1 parent de95c1a commit 9c66a9f
Showing 1 changed file with 56 additions and 0 deletions.
56 changes: 56 additions & 0 deletions docs/tutorials/advanced_usage_examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Advanced Usage Examples
LLMeBench offers support for advanced benchmarking use cases. In this tutorial, we provide example commands for such cases starting from the following general command:

``` bash
python -m llmebench --filter '*benchmarking_asset*' <benchmark-dir> <results-dir>
```

As can be seen in the previous command, the framework performs a wildcard search over the [benchmarking assets directory](https://github.com/qcri/LLMeBench/tree/main/assets) to identify the assets(s) to run as specified by `'*benchmarking_asset*'`. This is possible becuase we roughly maintain the following strcutrue and file naming scheme in the benchmarking assets directory.
> language_code/task_category/task/Dataset_Model_LearningSetup.py

### Running all assets for a specific language
The framework currently uses a two letter language code. It is possible to run all assets implemented for a single language using the command:

```bash
python -m llmebench --filter '*language_code/*' <benchmark-dir> <results-dir>
```
- `language_code`: Example values: "ar"(--> Arabic), "en"(--> English), "fr"(--> French), etc.


### Running all assets for a category of tasks
We currently release assets under **eight** task categories as listed [here](https://github.com/qcri/LLMeBench/tree/main/assets/ar). Running assets for one category can be done as follows:
```bash
python -m llmebench --filter '*task_category/*' <benchmark-dir> <results-dir>
```
- `task_category`: Example values: "MT"(for Machine Translation), "semantics", "sentiment_emotion_others", etc.

Running the above command will run assets from all _models_, _languages_, _subtasks_, and _learning setups_ for a `task_category`.

### Running all assets for a specific task
As with task categories, we also maintain consistent task names across languages, learning setups, etc. To run assets for a sinlge task:

```bash
python -m llmebench --filter '*task/*' <benchmark-dir> <results-dir>
```
- `task`: Example values: "sentiment", "SNS", "NLI", "news_categorization", etc.


### Running all assets for a specific model
It is possible to benchmark a single model using the following command:
```bash
python -m llmebench --filter '*model*' <benchmark-dir> <results-dir>
```
- `model`: Example values: "GPT35", "GPT4", "BLOOMZ", etc.


### Running all assets for a specific learning setup
The framework currently supports both zero-shot and few-shot learning setups. To run all zero-shot assets:
```bash
python -m llmebench --filter '*ZeroShot*' <benchmark-dir> <results-dir>
```
To run all few shot assets:

```bash
python -m llmebench --filter '*FewShot*' --n_shots <n> <benchmark-dir> <results-dir>
```

0 comments on commit 9c66a9f

Please sign in to comment.