RankFind does not use Levenshtein distance to build results #10

colelawrence · 2015-09-08T06:14:01Z

RankFind still uses plain Find to determine initial results, then measures Levenshtein distance afterwards. Maybe we can have another RankFindLevenshtein that uses Levenshtein distance to determine results.

The text was updated successfully, but these errors were encountered:

lithammer · 2015-09-09T11:05:28Z

I actually played around with something like that for RankFind at one point. What I found was that unless you set a high threshold, most longer targets would never match.

Say you search for "aaa" in "aaabbbbbbbbbbbbbbbbbbb" and your threshold is 15 it wouldn't match. Perhaps that's exactly what you wanted, but I think it might have to be a bit smarter and take length into consideration as well or something... This is basically when I went with the current solution of RankFind 😃

Another approach would be to create a RankFindFunc(s, t string, fn func(t, s string) bool) where you can define your own criteria.

lithammer · 2015-09-09T11:08:38Z

Then you'd have something like this

const threshold = 15

func predicate(s, t string) bool {
    distance := LevenshteinDistance(s, t)  // 19
    return distance < threshold
}

fuzzy.RankFindFunc("aaa", "aaabbbbbbbbbbbbbbbbbbb", predicate)  // false

colelawrence · 2015-09-09T14:12:31Z

Interesting, thank you for the explanations!

On Wed, Sep 9, 2015, 6:08 AM Peter Renström [email protected]
wrote:

Then you'd have something like this

const threshold = 15
func predicate(s, t string) bool {
distance := LevenshteinDistance(s, t)
return distance < threshold
}
RankFindFunc("aaa", "aaabbbbbbbbbbbbbbbbbbb", predicate)

—
Reply to this email directly or view it on GitHub
#10 (comment)
.

lithammer · 2015-09-09T15:03:41Z

Yeah, so I'm not sure the distance alone is enough to determine a match. Having "a" match "ab" but not "abc" is confusing, it should give at least as many hits as a plain old substring search.

Did you have a good use-case for this? Perhaps I'm just missing something here 😄

colelawrence · 2015-09-10T19:15:37Z

Yeah, so I use it to create a search engine for my GoCourseSort project, and the reason I use Levenshtein instead of simple match, is so a word in the search query matches a keyword from the database of strings.

I think it is vaguely similar to the way BigTables work (don't quote me).

Imagine book titles: "Gone with the Wind", "Gone Girl", and "The Girls", we create an index of key words with references:

gonegirl := &Book{
  title: "Gone Girl",
}
gonewiththewind := &Book{
  title: "Gone with the Wind",
}
thegirls := &Book{
  title: "The Girls",
}
indexByKeywords := map[string][]*Book {
  "girls": { thegirls },
  "girl": { gonegirl },
  "gone": { gonegirl, gonewiththewind },
  "with": { gonewiththewind },
  "the": { gonewiththewind, thegirls },
  "wind": { gonewiththewind },
}

Now you enter the search term "the girls"

Then I'm Levenshtein ranking "the" against the keys of indexByKeywords, and "girls" against the keys of indexByKeywords, then using a ranking formula based on Levenshtein distance, index of word in search, and index of word in title, for each list of references I get back per search term.

This is important because in my search engine, I don't want order of words to be important in any way, but I need "CS" to match "CSC" and "131" to match "121", because I'm using it for a college course catalog.

elazarl · 2016-01-07T18:55:23Z

You might want to look at my implementation and the fuzzy find

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RankFind does not use Levenshtein distance to build results #10

RankFind does not use Levenshtein distance to build results #10

colelawrence commented Sep 8, 2015

lithammer commented Sep 9, 2015

lithammer commented Sep 9, 2015

colelawrence commented Sep 9, 2015

lithammer commented Sep 9, 2015

colelawrence commented Sep 10, 2015

elazarl commented Jan 7, 2016

RankFind does not use Levenshtein distance to build results #10

RankFind does not use Levenshtein distance to build results #10

Comments

colelawrence commented Sep 8, 2015

lithammer commented Sep 9, 2015

lithammer commented Sep 9, 2015

colelawrence commented Sep 9, 2015

lithammer commented Sep 9, 2015

colelawrence commented Sep 10, 2015

elazarl commented Jan 7, 2016