Diachronic-Embeddings

Github link: https://github.com/ezosa/Diachronic-Embeddings

This is code that was used in our recent work on clustering of diachronic word embeddings:

Pivovarova, L., Marjanen, J., & Zosa, E. (2019). Word Clustering for Historical Newspapers Analysis. In Ranlp Workshop on Language technology for Digital Humanities.

Marjanen, J., Pivovarova, L., Zosa, E., & Kurunmäki, J. (2019). Clustering ideological terms in historical newspaper data with diachronic word embeddings. In 5th International Workshop on Computational History, HistoInformatics 2019. CEUR-WS.

Models

Diachronic embeddings built on the National Library of Finland newspaper collection could be downloaded from here.

We used an incremental training method, closely following (Kim et al., 2014) and previously applied by (Hengchen et. al, 2019). More explanations, code and several embeddings model check could be found here.

Clustering

Once you obtained enbeddings you can apply clustering using clustering.py (clustering of selected words) or cluster_all.py (enriched clustering).

Currently the code uses hard-coded links to models and hardcoded list of words.

Vizualization

The clustering outputs json files that can be used to make Sankey chart using diachronic_shift_sankey.py.

Selected enbeddings could be also vizualized using embeddings_drift_tsne.py.

The code currently uses hard-coded paths.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diachronic-Embeddings

Models

Clustering

Vizualization

About

Releases

Packages

NewsEye/Diachronic-Embeddings

Folders and files

Latest commit

History

Repository files navigation

Diachronic-Embeddings

Models

Clustering

Vizualization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages