You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In EZSearch for each word in the search term for each search field, a lucene query is built up here:
// Ensure page contains all search terms in some way
foreach (var term in model.SearchTerms)
{
var groupedOr = new StringBuilder();
foreach (var searchField in model.SearchFields)
{
groupedOr.AppendFormat("{0}:{1}* ", searchField, term);
}
query.Append("+(" + groupedOr.ToString() + ") ");
}
But you can't have '+if' as 'if' will never be in the index as it's a stopword and the result is the entire phrase won't return a match - if you leave out the stopwords from the phrase then it will match.
So I think it would be neat to ignore English stopwords in search terms, and also words of 3 chars length, something like this:
// Splits a string on space, except where enclosed in quotes (ignore stopwords)
public IEnumerable<string> Tokenize(string input)
{
var tokens = Regex.Matches(input, @"[\""].+?[\""]|[^ ]+")
.Cast<Match>()
.Select(m => m.Value.Trim('\"'))
.ToList();
tokens = tokens.Where(x => !StopAnalyzer.ENGLISH_STOP_WORDS_SET.Contains(x.ToLower()) && x.Length > 3).ToList();
return tokens;
}
stick a
@using Lucene.Net.Analysis
and now if you search for the phrase: Can I dance with a bear if I'm lying to my fitness instructor
it's the same as searching for keywords 'Can dance bear lying fitness instructor' and the matching article is found! as EZsearch isn't insisting the search should include words that can't be in the index.
The text was updated successfully, but these errors were encountered:
Had a site using EZSearch, and client reported problem searching for the phrase:
Can I dance with a bear if I'm lying to my fitness instructor
There existed a page in Umbraco with a nodeName with that exact matching text, but EZsearch does not return it as a match!
This is because the index was using the Standard Analyzer and the search phrase included English Stopwords that are excluded by the Analyzer from the index, eg "with", "if" ,"a" (https://stackoverflow.com/questions/17527741/what-is-the-default-list-of-stopwords-used-in-lucenes-stopfilter)
In EZSearch for each word in the search term for each search field, a lucene query is built up here:
But you can't have '+if' as 'if' will never be in the index as it's a stopword and the result is the entire phrase won't return a match - if you leave out the stopwords from the phrase then it will match.
So I think it would be neat to ignore English stopwords in search terms, and also words of 3 chars length, something like this:
stick a
@using Lucene.Net.Analysis
and now if you search for the phrase: Can I dance with a bear if I'm lying to my fitness instructor
it's the same as searching for keywords 'Can dance bear lying fitness instructor' and the matching article is found! as EZsearch isn't insisting the search should include words that can't be in the index.
The text was updated successfully, but these errors were encountered: