Full write-up in pdf form can be found in the writeup/
folder or in the Wellesley College repositories.
- Contains forced alignents and transcriptions for the text-dependent classifier
- The 7 subfolders correspond to each of the 7 transcribed accents
- Simple transcriptions of speech files are organized by accent. The format is a
.csv
file compiled via releasing transcription tasks on Amazon Mechanical Turk, then personally cleaned up by the author. *Note: errors still exist in some transcriptions.
- Console print logs of results from the text-dependent "trans" (transcribed) classifier and the text-independent "untrans" (untranscribed) classifier
- Contains lists of filenames that were split into train and test data, via a randomized 75:25 split
- Script and data (in csv format) to test GMM classifcation based on 3 vowel formants for AR, CZ, and IN accents
- Text-independent (untranscribed) Classifier
gmmClassifier.py
|| full script; loads data, trains models, classify test datagmmTrain.py
|| modularizes training only, stores models in directorygmmTest.py
|| modularizes testing only
- Text-dependent (transcribed) Classifier
parseTextGrid.py
|| prepares data by converting forced alignments of speech into plp features (sorted by accent and phoneme)phoneClassifier.py
|| full script; gmm Classification of transcribed phonemes
avgnpy_test.py
|| takes average of each dimension of PLP vector across all time windows from a given sound fileminiClassifier.py
|| does univariate GMM classification of AR, HI, MA