From 9c66a9fc984b627bf7121cf7c5a8cb4362e27dee Mon Sep 17 00:00:00 2001 From: Maram Hasanain Date: Thu, 14 Sep 2023 14:41:28 +0300 Subject: [PATCH] Create advanced_usage_examples.md Created new structure for docs folder, and added advanced usage examples tutorial --- docs/tutorials/advanced_usage_examples.md | 56 +++++++++++++++++++++++ 1 file changed, 56 insertions(+) create mode 100644 docs/tutorials/advanced_usage_examples.md diff --git a/docs/tutorials/advanced_usage_examples.md b/docs/tutorials/advanced_usage_examples.md new file mode 100644 index 00000000..69f2a912 --- /dev/null +++ b/docs/tutorials/advanced_usage_examples.md @@ -0,0 +1,56 @@ +# Advanced Usage Examples +LLMeBench offers support for advanced benchmarking use cases. In this tutorial, we provide example commands for such cases starting from the following general command: + +``` bash +python -m llmebench --filter '*benchmarking_asset*' +``` + +As can be seen in the previous command, the framework performs a wildcard search over the [benchmarking assets directory](https://github.com/qcri/LLMeBench/tree/main/assets) to identify the assets(s) to run as specified by `'*benchmarking_asset*'`. This is possible becuase we roughly maintain the following strcutrue and file naming scheme in the benchmarking assets directory. +> language_code/task_category/task/Dataset_Model_LearningSetup.py + + +### Running all assets for a specific language +The framework currently uses a two letter language code. It is possible to run all assets implemented for a single language using the command: + +```bash +python -m llmebench --filter '*language_code/*' +``` +- `language_code`: Example values: "ar"(--> Arabic), "en"(--> English), "fr"(--> French), etc. + + +### Running all assets for a category of tasks +We currently release assets under **eight** task categories as listed [here](https://github.com/qcri/LLMeBench/tree/main/assets/ar). Running assets for one category can be done as follows: +```bash +python -m llmebench --filter '*task_category/*' +``` +- `task_category`: Example values: "MT"(for Machine Translation), "semantics", "sentiment_emotion_others", etc. + +Running the above command will run assets from all _models_, _languages_, _subtasks_, and _learning setups_ for a `task_category`. + +### Running all assets for a specific task +As with task categories, we also maintain consistent task names across languages, learning setups, etc. To run assets for a sinlge task: + +```bash +python -m llmebench --filter '*task/*' +``` +- `task`: Example values: "sentiment", "SNS", "NLI", "news_categorization", etc. + + +### Running all assets for a specific model +It is possible to benchmark a single model using the following command: +```bash +python -m llmebench --filter '*model*' +``` +- `model`: Example values: "GPT35", "GPT4", "BLOOMZ", etc. + + +### Running all assets for a specific learning setup +The framework currently supports both zero-shot and few-shot learning setups. To run all zero-shot assets: +```bash +python -m llmebench --filter '*ZeroShot*' +``` +To run all few shot assets: + +```bash +python -m llmebench --filter '*FewShot*' --n_shots +```