You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not actually sure this approach is feasible in the general case but the algorithm actually involves writing back to the string being tokenised so it can't be "pre-tokenised"
The key to any speedup and/or reduction in memory consumption is likely to involve replacing the Trie.
The Trie only exists because we need to be able to do longest-prefix retrieval on a dictionary.
We could use a regular Python dictionary if incoming strings were tokenised by collation key.
What I'm basically thinking is:
It will be worth having some sort of timed test to see if this approach actually makes a (positive) difference but I suspect it might.
The text was updated successfully, but these errors were encountered: