Skip to content

Commit

Permalink
Merging (Identically Specified) MinHashLSH objects
Browse files Browse the repository at this point in the history
Fixes #205
  • Loading branch information
rupeshkumaar committed Mar 12, 2024
1 parent ce29b01 commit 39e60f3
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 5 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ jobs:
runs-on: "ubuntu-latest"
strategy:
matrix:
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11", "3.12"]
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"]
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
Expand Down
8 changes: 6 additions & 2 deletions datasketch/lsh.py
Original file line number Diff line number Diff line change
Expand Up @@ -232,12 +232,16 @@ def merge(
check_overlap: bool = False
):
"""Merge the other MinHashLSH with this one, making this one the union
of both the MinHashLSH.
of both.
Note:
Only num_perm, number of bands and sizes of each band is checked for equivalency of two MinHashLSH indexes.
Other initialization parameters threshold, weights, storage_config, prepickle and hash_func are not checked.
Args:
other (MinHashLSH): The other MinHashLSH.
check_overlap (bool): Check if there are any overlapping keys before merging and raise if there are any.
(`default=True`)
(`default=False`)
Raises:
ValueError: If the two MinHashLSH have different initialization
Expand Down
4 changes: 2 additions & 2 deletions docs/lsh.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,12 +77,12 @@ plotting code.
.. figure:: /_static/lsh_benchmark.png
:alt: MinHashLSH Benchmark

You can merge two MinHashLSH object using the ``merge`` function. This
You can merge two MinHashLSH indexes to create a union index using the ``merge`` method. This
makes MinHashLSH useful in parallel processing.

.. code:: python
# The merges the lsh1 with lsh2.
# This merges the lsh1 with lsh2.
lsh1.merge(lsh2)
There are other optional parameters that can be used to tune the index.
Expand Down

0 comments on commit 39e60f3

Please sign in to comment.