Skip to content

Commit

Permalink
Update adding_dataset.md
Browse files Browse the repository at this point in the history
Removed youtube link, added note on further details location.
  • Loading branch information
MaramHasanain authored Sep 17, 2023
1 parent e567001 commit 760d94d
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions docs/tutorials/adding_dataset.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# Adding Dataset ([See Demo](https://youtu.be/_sO2PhKhKGA?feature=shared))
<!----# Adding Dataset ([See Demo](https://youtu.be/_sO2PhKhKGA?feature=shared)) --->
# Adding Dataset

Check if the dataset used by your task already has an implementation in `llmebench/datasets`. If not, implement a new dataset module (e.g. `llmebench/datasets/SemEval23.py`), which implements a class (e.g. `SemEval23Dataset`) which subclasses `DatasetBase`. See [existing dataset modules](llmebench/datasets) for inspiration. Each new dataset class requires implementing four functions:

```python
Expand Down Expand Up @@ -26,6 +28,8 @@ class NewDataset(DatasetBase):
# "input_id": this optional key will be used for deduplication
```

**Note:** in case of few shots assets, the framework provides the functionality of deduplicating the training examples, from which few shots are being extracted, against the evaluatin dataset, based on sample IDs. To enable this functionality, `load_data` should also define `"input_id"` per input sample.
**Notes:**
- In case of few shots assets, the framework provides the functionality of deduplicating the training examples, from which few shots are being extracted, against the evaluatin dataset, based on sample IDs. To enable this functionality, `load_data` should also define `"input_id"` per input sample.
- Further details on how to implement each function for a dataset can be found [here](https://github.com/qcri/LLMeBench/blob/main/llmebench/datasets/dataset_base.py).

**Once the `Dataset` is implemented, export it in `llmebench/datasets/__init__.py`.**

0 comments on commit 760d94d

Please sign in to comment.