Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searches are likely not using solr properties correctly. #515

Open
kaladay opened this issue Jan 20, 2023 · 2 comments · Fixed by #523 or #527
Open

Searches are likely not using solr properties correctly. #515

kaladay opened this issue Jan 20, 2023 · 2 comments · Fixed by #523 or #527
Assignees
Labels
bug Something isn't working enhancement New feature or request spike

Comments

@kaladay
Copy link
Contributor

kaladay commented Jan 20, 2023

Describe the bug
The search logic seems confusing and wrong.
Searching for words by themselves either don't work at all or work depending on things like the operand and and the operand or.
For example, searching for apple may often not work.
Some of the work-arounds would be to search * apple or * +apple.

Not only that but, seemingly randomly, searches end up included results that are clearly not in the selected field.

It has been discovered that using q for searching and prepending the field like q=title:apple to be a likely part of the problem. The property df (default field) is likely the cause of the seemingly random unrelated results.

The search may be improved by using df and q like this example: q=apple&df=title.

It may be possible to still use *:apple in q.

This needs to be investigated and a solution needs to be provided.

Solving this may solve #514 because that issue may be a symptom of the problem observed in this issue.

To Reproduce
Steps to reproduce the behavior:

  1. Go to any discovery view.
  2. search for a single word using a field, such as 'title'.
  3. Investigate the query created, looking at the service logs.

Expected behavior
Searching should make sense.
A search for apple should find matches for apple if they exist and should not find matches where apple does not exist.

@kaladay kaladay added bug Something isn't working enhancement New feature or request spike labels Jan 20, 2023
@kaladay
Copy link
Contributor Author

kaladay commented Jan 20, 2023

All the df does is prepend the specified field onto each word.
For example, with a search of "red apple" and a df of title, we get:

  • q=title:red apple

There are problems with this and we might need to have sow enabled.
With sow=true, we instead get:

  • q=title:red title:apple

The wildcards also introduce a problem.
Wildcards are not expanded the way in which we think.
The search of "red apple" actually searches for (when sow is false):

  • MatchAllDocsQuery(*:*) q=title:red apple MatchAllDocsQuery(*:*).
    This looks to me like it pulls in other fields.

Using df is a step forward, but sow needs to be used.
When not using df, the default appears to be _text_ which is where we copy everything into for the all_fields matches.

There is also this important documentation note:

NOTE: If you want to be able to sort on a field whose contents you want to tokenize to facilitate searching, use a copyField directive in the the Schema to clone the field. Then search on the field and sort on its clone.

I strongly suspect that the rest of the problems are in how we structure the solr core and use the properties.

see: https://solr.apache.org/guide/7_7/the-standard-query-parser.html

@jcreel
Copy link
Member

jcreel commented Jan 26, 2023

All the fields in the Metadata Application Profile (http://oaktrust.library.tamu.edu/handle/1969.1/175368) and the new ones that we have accumulated will need to have exact-match facets, tokenizations, and search fields - potentially achieved with copy-fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request spike
Projects
None yet
3 participants