Skip to content

This repository is for the development of Danish abstractive summarisation models.

License

Notifications You must be signed in to change notification settings

Danish-summarisation/DanSum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About The Project

This repository contains the code for developing an automatic abstractive summarisation tool in Danish.

The model can be used for summarisation of individual news articles through the huggingface API.

Abstract

Automatic abstractive text summarization is a challenging task in the field of natural language processing. This paper presents a model for domain-specific summarization for Danish news articles, DanSumT5; an mT5 model fine-tuned on a cleaned subset of the DaNewsroom dataset consisting of abstractive summary-article pairs. The resulting state-of-the-art model is evaluated both quantitatively and qualitatively, using ROUGE and BERTScore metrics and human rankings of the summaries. We find that although model refinements increase quantitative and qualitative performance, the model is still prone to factual errors. We discuss the limitations of current evaluation methods for automatic abstractive summarization and underline the need for improved metrics and transparency within the field. We suggest that future work should employ methods for detecting and reducing errors in model output and methods for referenceless evaluation of summaries.

Key words: automatic summarisation, transformers, Danish, natural language processing

Model performance

The models were fine-tuned using hyperparameter search. These are the quantitative results of our model-generated summaries:

Model ROUGE-1 ROUGE-2 ROUGE-L BERTScore
DanSum-mT5-small 21.42 [21.26, 21.55] 6.21 [6.11, 6.30] 16.10 [15.98, 16.22] 88.28 [88.26, 88.31]
DanSum-mT5-base 23.21 [23.06, 23.36] 7.12 [7.00, 7.22] 17.64 [17.50, 17.79] 88.77 [88.74, 88.80]
DanSum-mT5-large 23.76 [23.60, 23.91] 7.46 [7.35, 7.59] 18.25 [18.12, 18.39] 88.97 [88.95, 89.00]

To get a better understanding of the model's performance, we also had two of the authors to blindly (without knowledge of which model generated which summary) rank the model-generated summaries for 100 articles.

Get started

  • The DaNewsroom data set can be accessed upon request (https://github.com/danielvarab/da-newsroom)
  • Clone the repo
    git clone https://github.com/Danish-summarisation/DanSum
  • Install required modules
    pip install -r requirements.txt

Acknowledgments

About

This repository is for the development of Danish abstractive summarisation models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages