Skip to content

Latest commit

 

History

History
68 lines (47 loc) · 2.1 KB

README.md

File metadata and controls

68 lines (47 loc) · 2.1 KB

Minpair

Generate minimal pairs (and minimal sets) for US English words.

In phonology, minimal pairs are pairs of words or phrases in a particular language, spoken or signed, that differ in only one phonological element

-- https://en.wikipedia.org/wiki/Minimal_pair

>>> import minpair
>>> minpair.vowel_minpair(['AE', 'EH'])[:4]
[{'AE': 'al', 'EH': 'l'}, {'AE': 'axe', 'EH': 'x'}, {'AE': 'bad', 'EH': 'bed'}, {'AE': 'bag', 'EH': 'beg'}]

Installation

pip install -U minpair
>>> import minpair

Usage

Vowel minimal pairs

Words that differ in only one vowel phonological element. For example: bad, bed

>>> minpair.vowel_minpair(['AE', 'EH'])[:4]
[{'AE': 'al', 'EH': 'l'}, {'AE': 'axe', 'EH': 'x'}, {'AE': 'bad', 'EH': 'bed'}, {'AE': 'bag', 'EH': 'beg'}]

Config

Corpus data

This package depends on a few NLTK's corpora, namely: brown, cmudict, universal_tagset, and words corpus. By default, this package will download these corpora into NLTK data directory if not available.

To disable the auto download of corpus data:

>>> minpair.generator(download_corpus=False).vowel_minpair(['AE', 'EH'])[:4]
[{'AE': 'al', 'EH': 'l'}, {'AE': 'axe', 'EH': 'x'}, {'AE': 'bad', 'EH': 'bed'}, {'AE': 'bag', 'EH': 'beg'}]

POS

This package depends on part-of-speech tagger to filter words from meaningful lexical categories. List of possible POS tags are found here. By default, this package will only return words that are tagged as 'ADJ', 'NOUN' or 'VERB'.

To use different POS tags:

>>> minpair.generator(pos=['VERB']).vowel_minpair(['AE', 'EH'])[:4]
[{'AE': 'bag', 'EH': 'beg'}, {'AE': 'bat', 'EH': 'bet'}, {'AE': 'blast', 'EH': 'blest'}, {'AE': 'kept', 'EH': 'kept'}]

Alternatively, using method chaining:

>>> minpair.generator().pos(['VERB']).vowel_minpair(['AE', 'EH'])[:4]
[{'AE': 'bag', 'EH': 'beg'}, {'AE': 'bat', 'EH': 'bet'}, {'AE': 'blast', 'EH': 'blest'}, {'AE': 'kept', 'EH': 'kept'}]