You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for this great work. I tried it out and found that the split is sometimes too aggressive to me, for example, the 'occupational' is split into 'occ', 'u', 'p', 'a', 't' and ional', and 'particulate' into 'part', 'icu', 'late'. Strangely it's not always like this - sometimes I can get 'occupational' and 'particulate' correctly. Any thoughts about why this happens?
The text was updated successfully, but these errors were encountered:
I don't remember correctly anymore, but I also ran into this issue a while back (if I recall I solved the problem). I think it had something to do with either ligatures (like the combined fi character), trying to split really really long strings (which gave a bug for some reason), or with one word having a typo/one letter being missing at the beginning or end of the string. See if maybe one of those is the cause of your problem. NLP is not fun.
Using something like hunspell or some other correction with levenshtein distance might help fix any rogue character issues.
Thanks for this great work. I tried it out and found that the split is sometimes too aggressive to me, for example, the 'occupational' is split into 'occ', 'u', 'p', 'a', 't' and ional', and 'particulate' into 'part', 'icu', 'late'. Strangely it's not always like this - sometimes I can get 'occupational' and 'particulate' correctly. Any thoughts about why this happens?
The text was updated successfully, but these errors were encountered: