GitHub - Danish-summarisation/DanSum: This repository is for the development of Danish abstractive summarisation models.

DanSumT5: Automatic Abstractive Summarisation in Danish

Sara Kolding, Katrine Nymann, Ida Bang Hansen, Kenneth C. Enevoldsen & Ross Deans Kristensen-McLachlan
Access our models through huggingface

About The Project

This repository contains the code for developing an automatic abstractive summarisation tool in Danish.

The model can be used for summarisation of individual news articles through the huggingface API.

Abstract

Automatic abstractive text summarization is a challenging task in the field of natural language processing. This paper presents a model for domain-specific summarization for Danish news articles, DanSumT5; an mT5 model fine-tuned on a cleaned subset of the DaNewsroom dataset consisting of abstractive summary-article pairs. The resulting state-of-the-art model is evaluated both quantitatively and qualitatively, using ROUGE and BERTScore metrics and human rankings of the summaries. We find that although model refinements increase quantitative and qualitative performance, the model is still prone to factual errors. We discuss the limitations of current evaluation methods for automatic abstractive summarization and underline the need for improved metrics and transparency within the field. We suggest that future work should employ methods for detecting and reducing errors in model output and methods for referenceless evaluation of summaries.

Key words: automatic summarisation, transformers, Danish, natural language processing

Model performance

The models were fine-tuned using hyperparameter search. These are the quantitative results of our model-generated summaries:

Model	ROUGE-1	ROUGE-2	ROUGE-L	BERTScore
DanSum-mT5-small	21.42 [21.26, 21.55]	6.21 [6.11, 6.30]	16.10 [15.98, 16.22]	88.28 [88.26, 88.31]
DanSum-mT5-base	23.21 [23.06, 23.36]	7.12 [7.00, 7.22]	17.64 [17.50, 17.79]	88.77 [88.74, 88.80]
DanSum-mT5-large	23.76 [23.60, 23.91]	7.46 [7.35, 7.59]	18.25 [18.12, 18.39]	88.97 [88.95, 89.00]

To get a better understanding of the model's performance, we also had two of the authors to blindly (without knowledge of which model generated which summary) rank the model-generated summaries for 100 articles.

Get started

The DaNewsroom data set can be accessed upon request (https://github.com/danielvarab/da-newsroom)

Clone the repo

git clone https://github.com/Danish-summarisation/DanSum

Install required modules
```
pip install -r requirements.txt
```

Acknowledgments

DAT5 icon created with OpenAI's DALL-E 2

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
configs		configs
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DanSumT5: Automatic Abstractive Summarisation in Danish

About The Project

Abstract

Model performance

Get started

Acknowledgments

About

Releases

Packages

Contributors 4

Languages

License

Danish-summarisation/DanSum

Folders and files

Latest commit

History

Repository files navigation

DanSumT5: Automatic Abstractive Summarisation in Danish

About The Project

Abstract

Model performance

Get started

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages