A Golang driven text search engine
- Go 1.9
- Govendor 1.0.8 or superior
# Download external dependencies
$ govendor sync
# Build the project and install the binary
$ govendor install +local
To run the specs:
$ govendor test -v +local
To run the project:
# Run the binary
$ $GOPATH/bin/text-search-engine -directory [file's directory]
To check all the available command line parameters, run:
$ $GOPATH/bin/text-search-engine -h
Right after running the system, an interactive terminal will be present on the console.
This console will accept any character as input until the first line break.
For each input sentence, the rank is calculated against each read file and the top 10 ranks will be displayed
as result.
To quit the system, just submit the quit sentence: :quit
$ ./text-search-engine -directory test-utils/11-files
11 file(s) read in the directory test-utils/11-files
search> Lorem
file1.txt: 100.00% match
file10.txt: 100.00% match
file11.txt: 100.00% match
file3.txt: 100.00% match
file6.txt: 100.00% match
file7.txt: 100.00% match
file8.txt: 100.00% match
file9.txt: 100.00% match
file2.txt: 0.00% match
file4.txt: 0.00% match
search> dolor
file1.txt: 100.00% match
file10.txt: 100.00% match
file11.txt: 100.00% match
file2.txt: 100.00% match
file3.txt: 100.00% match
file4.txt: 100.00% match
file6.txt: 100.00% match
file7.txt: 100.00% match
file8.txt: 100.00% match
file9.txt: 100.00% match
search> :quit
$
The matching algorithm implemented is based on the Levenshtein distance algorithm.
The input sequence is broken into words. A word is defined by any sequence of characters, delimited by a space or a line break. Thus, for each word, the Levenshtein distance to the text is calculated. The score is based on the shortest distance calculated. Finally, the final score is the average value among all the calculated scores
Example:
- Input sentence: "Far Fo"
- Content: "The Far Fox"
Distances:
Sentence Word | Target Word | Distance |
---|---|---|
Far | The | 3 |
Far | Far | 0 (minimum) |
Far | Fox | 2 |
Fo | The | 3 |
Fo | Far | 2 |
Fo | Fox | 1 (minimum) |
-
Score for "Far":
Score = 1 - (MIN_DIST / len(TARGET_WORD)) = 1 - (0 / 3) = 1.0
-
Score for "Fo":
Score = 1 - (MIN_DIST / len(TARGET_WORD)) = 1 - (1 / 3) = 0.666
Rank = (1.0 + 0.666) / 2 = 0.833 or 83.3% Match
This project provides a Dockerfile
, which allows to create a Docker image with the application installed
and ready to use.
To create the docker image, run:
$ docker build -t text-search-engine .
A new Docker image called text-search-engine
will be created at the end of the process.
To run the system in a container, first we need to map to the container the directory with the files that will be processed.
As an example, let's consider we've a directory with files saved in /home/foobar/files
.
The Docker command will be:
$ docker run -v /home/foobar/files:/tmp/files -it text-search-engine -directory /tmp/files
The parameter -v
indicates to map the localhost directory /home/foobar/files
to the container directory
/tmp/files
.
The parameter -it
makes the container run in interactive mode, allowing to interact with container's STDIN.
Finally, the last parameter -directory /tmp/files
indicate the file's directory to the search engine app.