Accept Callables as Tokenizers for InMemoryDocumentStore #4695
Closed
farhanhubble
started this conversation in
Ideas
Replies: 1 comment
-
Thanks for the feedback, I'm going to create a feature request from this discussion, contributions are welcome! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
InMemoryDocumentStore
currently only accepts a tokenizing pattern through the argumentbm25_tokenization_regex: str = r"(?u)\b\w\w+\b"
. The underlying BM25 supports acallable
though. Removing this restriction will enable correct tokenization of a larger variety of corpora. I ran into this limitation trying to index JSON documents that contain key-value pairs, like:Beta Was this translation helpful? Give feedback.
All reactions