Questions about datas #3

lbourdois · 2022-11-10T09:49:49Z

Hi 😀

First of all, thank you for your very interesting work 🚀

I was wondering about two points where I didn't find an answer by myself (maybe I didn't search well) and I would need your help.

I would have liked to know for a given task, what is the prompt used for finetuning for a given language. For example, let's say French summarization. So I started to search to know which prompt were used for the French summarization but I didn't find a list that would summarize such information. PromptSource provides 2085 prompts in English, but nothing about translations in other languages. Does such a list exist? 🤔
To try to have a solution to the previous point, I thought I had to download the xP3mt dataset and read directly which prompts were used. The problem is that you can actually download all the data for a selected language but you can't do an additional filter on the task/(sub)dataset. Would this be something that could be added?
Or even better, create individual multilingual datasets of the translations you have done. For example, having the ability to upload an "mSamSum" which would be the multilingual version of "SamSum" which is purely in English at the base. This would probably allow to be reused in other works, especially monolingual ones. If I take again the example of French summary, there are few data currently available: Orangesum, XLSum and Wiki-lingua. Having easy access to the translations of CNN Daily Mail, Gigaword, MultiNews, SamSum and XSum would allow to do very interesting things 🤯

lbourdois · 2022-11-30T09:34:41Z

By opening all the datasets and referring to bigscience-workshop/promptsource#838, it turns out that you did not translate all the datasets from English to French as I understood but add French part from 8 multilingual datasets (available in https://huggingface.co/datasets/bigscience/xP3/viewer/fr/train) and translated the prompts in French for 3 of these 8 datasets.
So my questions are not relevant, my bad, I close.

lbourdois closed this as completed Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about datas #3

Questions about datas #3

lbourdois commented Nov 10, 2022

lbourdois commented Nov 30, 2022

Questions about datas #3

Questions about datas #3

Comments

lbourdois commented Nov 10, 2022

lbourdois commented Nov 30, 2022