diff --git a/assets/brev-hf-law-qa-dataset.zip b/assets/brev-hf-law-qa-dataset.zip new file mode 100644 index 0000000..7f215e4 Binary files /dev/null and b/assets/brev-hf-law-qa-dataset.zip differ diff --git a/llama31_law.ipynb b/llama31_law.ipynb index 4b2d278..18ca809 100644 --- a/llama31_law.ipynb +++ b/llama31_law.ipynb @@ -102,7 +102,9 @@ "source": [ "### Step 1: Prepare the dataset\n", "\n", - "The dataset we used is a subset of the [Law-StackExchange dataset](https://huggingface.co/datasets/ymoslem/Law-StackExchange). We've already filtered and processed this dataset and it can be used to train the model for various different tasks - question title generation (summarization), law domain question answering, and question tag generation (multi-label classification). To run your own data cleaning and prepreocessing, please refer to the [data generation notebook](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/peft-curation-with-sdg). That tutorial also allows you to generate synthetic data and increase the size of the dataset." + "The dataset we used is a subset of the [Law-StackExchange dataset](https://huggingface.co/datasets/ymoslem/Law-StackExchange). We've already filtered and processed this dataset and it can be used to train the model for various different tasks - question title generation (summarization), law domain question answering, and question tag generation (multi-label classification). To run your own data cleaning and prepreocessing, please refer to the [data generation notebook](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/peft-curation-with-sdg). That tutorial also allows you to generate synthetic data and increase the size of the dataset.\n", + "\n", + "This dataset is licensed under the [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license. You can use it for any purpose, including commercial use, without attribution. However, if you use the dataset in a publication, please cite the original authors and the [Law-StackExchange dataset](https://huggingface.co/datasets/ymoslem/Law-StackExchange) repository." ] }, {