-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 481: Use case insensitive filter and add case insensitive string type. #496
Conversation
…g type. Cannot a filter to the `solr.StrField`. According to the SOLR documentation, a filter can only be added to something tokenized and a `solr.StrField` does not allow tokenization. This uses a `solr.TextField` instead. Several fields need to have case insensitive searches. A new type is added that uses the `KeywordTokenizer`, called `string_ci` and `strings_ci`. The `KeywordTokenizer` essentialy is a pretend token. It tokenizes the whole string, which is effectively the same as not having a tokenizer. The documentation even references the `KeywordTokenizer` as the method of disabling the tokenizer. Fields that should be case insensitive are moved from `string` to `string_ci` and `strings` to `strings_ci` respectively. There are potential performance concerns with using `solr.TextField` rather than `solr.StrField` due to the loss of the docvalues optimization feature. see: https://solr.apache.org/guide/7_7/field-types-included-with-solr.html#field-types-included-with-solr see: https://solr.apache.org/guide/7_7/field-type-definitions-and-properties.html#field-type-definitions-and-properties see: https://solr.apache.org/guide/7_7/field-properties-by-use-case.html#field-properties-by-use-case see: https://solr.apache.org/guide/7_7/tokenizers.html#keyword-tokenizer see: https://solr.apache.org/guide/7_7/docvalues.html
Suggested approach was a TextField using KeywordTokenizerFactory. Additionally, was suggested to seperate between index and query time with two analyzers. Such as
This would simply change all fields of type The question is, what search behavior changes are not desired by affording all fields to search with case insensitivity? One with minimal change approach may consider minimal change be that of changes to the versioned schema and not to the minimal changes to search behavior. Basically, adding additional field types is not minimal changes to versioned schema (obviously) and the search behavior changes may still be a minimum in term of anticipated or expected search terms. |
Not sure we need the additional field types. What behavior changes are there without the additional field types? |
…x date_created. The `strings_ci` is close enough to `whole_strings`, just use `whole_strings`. There is no `whole_string`. Rename `string_ci` to `whole_string`. To better prevent future problems, document these custom field types. The date_created is not multi-valued so use `whole_string`.
Description
Cannot a filter to the
solr.StrField
.According to the SOLR documentation, a filter can only be added to something tokenized and a
solr.StrField
does not allow tokenization. This uses asolr.TextField
instead.Several fields need to have case insensitive searches. A new type is added that uses the
KeywordTokenizer
, calledstring_ci
andstrings_ci
. TheKeywordTokenizer
essentialy is a pretend token. It tokenizes the whole string, which is effectively the same as not having a tokenizer. The documentation even references theKeywordTokenizer
as the method of disabling the tokenizer.Fields that should be case insensitive are moved from
string
tostring_ci
andstrings
tostrings_ci
respectively.There are potential performance concerns with using
solr.TextField
rather thansolr.StrField
due to the loss of the docvalues optimization feature.This change requires a change to the solr cor data structure.
I consider this a breaking change.
see: https://solr.apache.org/guide/7_7/field-types-included-with-solr.html#field-types-included-with-solr
see: https://solr.apache.org/guide/7_7/field-type-definitions-and-properties.html#field-type-definitions-and-properties
see: https://solr.apache.org/guide/7_7/field-properties-by-use-case.html#field-properties-by-use-case
see: https://solr.apache.org/guide/7_7/tokenizers.html#keyword-tokenizer
see: https://solr.apache.org/guide/7_7/docvalues.html
Fixes #481
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Checklist: