You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The corpus currently used is very small and seems to have just been thrown together by the original author (who called it "quick and dirty"). A larger corpus would be much appreciated, since the main problem with this library (which I've been using on-and-off for the past year, with mixed results) seems to be the small number of words it can detect (e.g. it couldn't even properly detect contractions before those were added to the corpus).
true. but i wanted to keep the default model small (<1M). (i actually think i pared down that original model somewhat, but my memory is fuzzy at this point). i'm open to better language models about the same size tho.
i just implemented importing your own model file. check out "Custom Language Models" in the readme.
(This is a suggested improvement)
The corpus currently used is very small and seems to have just been thrown together by the original author (who called it "quick and dirty"). A larger corpus would be much appreciated, since the main problem with this library (which I've been using on-and-off for the past year, with mixed results) seems to be the small number of words it can detect (e.g. it couldn't even properly detect contractions before those were added to the corpus).
Something like the following might be good:
https://www.kdnuggets.com/2017/11/building-wikipedia-text-corpus-nlp.html
or
https://www.corpusdata.org/formats.asp
The text was updated successfully, but these errors were encountered: