Larger corpus? #6

SpongebobSquamirez · 2018-10-26T06:39:10Z

(This is a suggested improvement)

The corpus currently used is very small and seems to have just been thrown together by the original author (who called it "quick and dirty"). A larger corpus would be much appreciated, since the main problem with this library (which I've been using on-and-off for the past year, with mixed results) seems to be the small number of words it can detect (e.g. it couldn't even properly detect contractions before those were added to the corpus).

Something like the following might be good:
https://www.kdnuggets.com/2017/11/building-wikipedia-text-corpus-nlp.html
or
https://www.corpusdata.org/formats.asp

keredson · 2019-08-10T02:26:19Z

true. but i wanted to keep the default model small (<1M). (i actually think i pared down that original model somewhat, but my memory is fuzzy at this point). i'm open to better language models about the same size tho.

i just implemented importing your own model file. check out "Custom Language Models" in the readme.

SpongebobSquamirez · 2019-09-19T10:48:52Z

Thanks for adding custom language models. Haven't tried it out yet but hopefully plugging in these new corpora is straightforward.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Larger corpus? #6

Larger corpus? #6

SpongebobSquamirez commented Oct 26, 2018

keredson commented Aug 10, 2019

SpongebobSquamirez commented Sep 19, 2019

Larger corpus? #6

Larger corpus? #6

Comments

SpongebobSquamirez commented Oct 26, 2018

keredson commented Aug 10, 2019

SpongebobSquamirez commented Sep 19, 2019