From a54734728d3e888a4347aaa1a3494340f3c3d98a Mon Sep 17 00:00:00 2001 From: hanhainebula <2512674094@qq.com> Date: Tue, 19 Nov 2024 20:13:42 +0800 Subject: [PATCH] release training data for bge-multilingual-gemma2 --- dataset/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/dataset/README.md b/dataset/README.md index 3fa838a1..22970d4b 100644 --- a/dataset/README.md +++ b/dataset/README.md @@ -8,6 +8,7 @@ This will point to the training data we use for training various models. | [bge-m3-data](https://huggingface.co/datasets/Shitao/bge-m3-data) | Fine-tuning data used by [bge-m3](https://huggingface.co/BAAI/bge-m3) | | [public-data](https://huggingface.co/datasets/cfli/bge-e5data) | Public data identical to [e5-mistral](https://huggingface.co/intfloat/e5-mistral-7b-instruct) | | [full-data](https://huggingface.co/datasets/cfli/bge-full-data) | The full dataset we used for training [bge-en-icl](https://huggingface.co/BAAI/bge-en-icl) | +| [bge-multilingual-gemma2-data](https://huggingface.co/datasets/hanhainebula/bge-multilingual-gemma2-data) | The full multilingual dataset we used for training [bge-multilingual-gemma2](https://huggingface.co/BAAI/bge-multilingual-gemma2) | | [reranker-data](Shitao/bge-reranker-data) | a mixture of multilingual datasets |