This repo contains code and data associated with "Language Model Quality Correlates with Psychometric Predictive Power in Multiple Languages", by Ethan G. Wilcox, Clara Meister, Ryan Cotterell and Tiago Pimentel, presented at EMNLP 2023.
scripts
contains the analysis scripts used in the paper, as well as a sub-directoryimages
with figures contained in the paper.MECO
contains merged files for our reading time dataset, where each row is associated with a single word and contains its reading time, frequency, length surprisal and entropy.
If you have any questions, please context Ethan Wilcox.