Searches are likely not using solr properties correctly. #515

kaladay · 2023-01-20T19:30:22Z

Describe the bug
The search logic seems confusing and wrong.
Searching for words by themselves either don't work at all or work depending on things like the operand and and the operand or.
For example, searching for apple may often not work.
Some of the work-arounds would be to search * apple or * +apple.

Not only that but, seemingly randomly, searches end up included results that are clearly not in the selected field.

It has been discovered that using q for searching and prepending the field like q=title:apple to be a likely part of the problem. The property df (default field) is likely the cause of the seemingly random unrelated results.

The search may be improved by using df and q like this example: q=apple&df=title.

It may be possible to still use *:apple in q.

This needs to be investigated and a solution needs to be provided.

Solving this may solve #514 because that issue may be a symptom of the problem observed in this issue.

To Reproduce
Steps to reproduce the behavior:

Go to any discovery view.
search for a single word using a field, such as 'title'.
Investigate the query created, looking at the service logs.

Expected behavior
Searching should make sense.
A search for apple should find matches for apple if they exist and should not find matches where apple does not exist.

The text was updated successfully, but these errors were encountered:

kaladay · 2023-01-20T21:29:34Z

All the df does is prepend the specified field onto each word.
For example, with a search of "red apple" and a df of title, we get:

q=title:red apple

There are problems with this and we might need to have sow enabled.
With sow=true, we instead get:

q=title:red title:apple

The wildcards also introduce a problem.
Wildcards are not expanded the way in which we think.
The search of "red apple" actually searches for (when sow is false):

MatchAllDocsQuery(*:*) q=title:red apple MatchAllDocsQuery(*:*).
This looks to me like it pulls in other fields.

Using df is a step forward, but sow needs to be used.
When not using df, the default appears to be _text_ which is where we copy everything into for the all_fields matches.

There is also this important documentation note:

NOTE: If you want to be able to sort on a field whose contents you want to tokenize to facilitate searching, use a copyField directive in the the Schema to clone the field. Then search on the field and sort on its clone.

I strongly suspect that the rest of the problems are in how we structure the solr core and use the properties.

see: https://solr.apache.org/guide/7_7/the-standard-query-parser.html

jcreel · 2023-01-26T19:27:04Z

All the fields in the Metadata Application Profile (http://oaktrust.library.tamu.edu/handle/1969.1/175368) and the new ones that we have accumulated will need to have exact-match facets, tokenizations, and search fields - potentially achieved with copy-fields.

kaladay added bug Something isn't working enhancement New feature or request spike labels Jan 20, 2023

kaladay assigned kaladay and rmathew1011 Jan 20, 2023

kaladay unassigned rmathew1011 Jan 30, 2023

kaladay mentioned this issue Feb 6, 2023

Issue 515: Redesign Solr search process, particularly the Solr Core. #523

Merged

13 tasks

kaladay linked a pull request Feb 6, 2023 that will close this issue

Issue 515: Redesign Solr search process, particularly the Solr Core. #523

Merged

13 tasks

kaladay mentioned this issue Feb 8, 2023

Issue 515: Improve text_ws and whole_string query searching. #527

Merged

6 tasks

kaladay linked a pull request Feb 8, 2023 that will close this issue

Issue 515: Improve text_ws and whole_string query searching. #527

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Searches are likely not using solr properties correctly. #515

Searches are likely not using solr properties correctly. #515

kaladay commented Jan 20, 2023 •

edited

Loading

kaladay commented Jan 20, 2023 •

edited

Loading

jcreel commented Jan 26, 2023

Searches are likely not using solr properties correctly. #515

Searches are likely not using solr properties correctly. #515

Comments

kaladay commented Jan 20, 2023 • edited Loading

kaladay commented Jan 20, 2023 • edited Loading

jcreel commented Jan 26, 2023

kaladay commented Jan 20, 2023 •

edited

Loading

kaladay commented Jan 20, 2023 •

edited

Loading