Skip to content

WP6: This is code that was used in our recent work on clustering of diachronic word embeddings

Notifications You must be signed in to change notification settings

NewsEye/Diachronic-Embeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

Diachronic-Embeddings

Github link: https://github.com/ezosa/Diachronic-Embeddings

This is code that was used in our recent work on clustering of diachronic word embeddings:

Pivovarova, L., Marjanen, J., & Zosa, E. (2019). Word Clustering for Historical Newspapers Analysis. In Ranlp Workshop on Language technology for Digital Humanities.

Marjanen, J., Pivovarova, L., Zosa, E., & Kurunmäki, J. (2019). Clustering ideological terms in historical newspaper data with diachronic word embeddings. In 5th International Workshop on Computational History, HistoInformatics 2019. CEUR-WS.

Models

Diachronic embeddings built on the National Library of Finland newspaper collection could be downloaded from here.

We used an incremental training method, closely following (Kim et al., 2014) and previously applied by (Hengchen et. al, 2019). More explanations, code and several embeddings model check could be found here.

Clustering

Once you obtained enbeddings you can apply clustering using clustering.py (clustering of selected words) or cluster_all.py (enriched clustering).

Currently the code uses hard-coded links to models and hardcoded list of words.

Vizualization

The clustering outputs json files that can be used to make Sankey chart using diachronic_shift_sankey.py.

Selected enbeddings could be also vizualized using embeddings_drift_tsne.py.

The code currently uses hard-coded paths.

About

WP6: This is code that was used in our recent work on clustering of diachronic word embeddings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published