Skip to content

donnekgit/autoglosser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bangor autoglosser

The code here was produced to POS-tag the conversational corpora assembled by the ESRC Centre for Research on Bilingualism in Theory & Practice at University of Wales Bangor.

The data was bilingual conversational running text, and the autoglosser tags it in one pass based on constraint grammar linguistic rules for each language.

Note that this code is not really packaged properly: because a lot of the work was done ad hoc, it's more like a compendium of things that worked for us. (To get a smaller, cleaner implementation, try the Gáidhlig autoglosser.)

This was remedied to some extent in the second version, Autoglosser2, though that was aimed at written Welsh only, rather than the conversational, code-switched, multilingual text in the Bangor corpora.

About

Bangor Autoglosser

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published