All notable changes to this project will be documented in this file.
- added
shingle
command tolsh
CLI; - added poor man's suggestion for number of hash functions based on average sie of shingle sets;
- bumped
github.com/zoomio/inout
to0.6.0
.
- changes to CLI: use raw text lines of provided sources instead of tags from given sources.
- use K-shingling instead of stop-word based shingling for similarity comparison in CLI.
- added subcommands to CLI:
lsh
for candidate pairs andsim
for similarity of candidate pair.
- added
#Jaccard
for finding Jaccard similarity between two sets.
- first release, provides candidate pairs via pipeline:
#Shingle
->#Minhash
->#LSH
.