Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use internal data object for MLBF to accommodate soft/hard block versions #22775

Merged
merged 2 commits into from
Oct 18, 2024

Conversation

KevinMind
Copy link
Contributor

@KevinMind KevinMind commented Oct 16, 2024

Relates to: mozilla/addons#15014

Description

Adds the concept of a data property to MLBF. This allows storing of both soft/hard blocks without actually modifying how stashes/filters are generated

Context

This PR doesn't change the underlying behaviour of the bloomfilter but modifies the data structure used for storing data in the MLBF class. Adding data for soft blocked versions and then handling the creating of filters or stashes based on that data will be much easier after this PR.

Specifically this PR ensures that data is loaded consistently via storage or database and that all data is loaded on instantiation. This ensures filters produced contain valid data.

Testing

You can create a blockfilter locally and ensure it has correct data.

./manage.py export_blocklist

Expect a directory to be created in ./storage/mlbf matching either the parameter you specified for id or a timestamp.

Expect two files cache.json containing the blocked/not_blocked versions for that data type (in this case always hard blocked) and filter where the suffix matches the data type (in the future it could be blocked or soft_blocked)

Expect the strings in the blocked property of cache.json to match the versions in the query above if you search for hard blocks.

Scenario 2: create a stash

Now you need to add a new block, using the django shell.
Next create another filter using the same command above.

Now, open a django shell and we can create a stash (make sure mlbf is the latest filter and previous is the first filter you created)

mlbf = MLBF.load_from_storage(<second_id>)
previouis = MLBF.load_from_storage(<first_id>)

Verify you can diff the filters and see what has changed

mlbf.blocks_changed_since_previous(previous_mlbf=previous)

The value should return 1 if you only added one additional block.
You can also pass None for previous and get back the number of blocks there are currently (2)

Now you can generate a stash

mlbf.generate_and_write_stash(previous_mlbf=previous)

This should create a new directory matching the id of the mlbf and should contain 2 files cache.json and now stash.json.

The cache.json should contain the new block you created.

{"unblocked": [], "blocked": ["{0f1fbd83-d0fe-42ef-b6ce-365860d6ca10}:23609.517.8654.43211"]}

Checklist

  • Add #ISSUENUM at the top of your PR to an existing open issue in the mozilla/addons repository.
  • Successfully verified the change locally.
  • The change is covered by automated tests, or otherwise indicated why doing so is unnecessary/impossible.
  • Add before and after screenshots (Only for changes that impact the UI).
  • Add or update relevant docs reflecting the changes made.

@KevinMind KevinMind changed the base branch from soft-block-configurable to master October 16, 2024 19:18
@KevinMind KevinMind force-pushed the soft-block-block-type branch 2 times, most recently from 029eea3 to f4bb5a9 Compare October 17, 2024 08:44
@KevinMind KevinMind marked this pull request as ready for review October 17, 2024 10:24
@KevinMind KevinMind requested review from a team and eviljeff and removed request for a team October 17, 2024 10:24
@KevinMind KevinMind changed the title Refactor MLBF in preparation for soft blocking Add data_type to mlbf to control filter/stash generation per data type Oct 17, 2024
Copy link
Member

@eviljeff eviljeff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

src/olympia/blocklist/cron.py Outdated Show resolved Hide resolved
src/olympia/blocklist/cron.py Outdated Show resolved Hide resolved
src/olympia/blocklist/tests/test_tasks.py Outdated Show resolved Hide resolved
src/olympia/blocklist/tests/test_cron.py Outdated Show resolved Hide resolved
src/olympia/blocklist/mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/mlbf.py Show resolved Hide resolved
src/olympia/blocklist/mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/mlbf.py Show resolved Hide resolved
@KevinMind KevinMind force-pushed the soft-block-block-type branch 3 times, most recently from be0a284 to 9fbdaaa Compare October 17, 2024 20:09
@KevinMind KevinMind force-pushed the soft-block-block-type branch from 9fbdaaa to b571b2e Compare October 17, 2024 20:14
@KevinMind KevinMind changed the title Add data_type to mlbf to control filter/stash generation per data type Use internal data object for MLBF to accommodate soft/hard block versions Oct 17, 2024
@KevinMind KevinMind requested a review from eviljeff October 17, 2024 20:32
Copy link
Member

@eviljeff eviljeff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good for now - I'm assuming they'll be a fair bit of reworking needed once we're building an additional soft-blocked bloom filter, and including details of both hard and soft in the stashes.

@KevinMind KevinMind merged commit 97f3dbd into master Oct 18, 2024
31 checks passed
@KevinMind KevinMind deleted the soft-block-block-type branch October 18, 2024 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants