Skip to content

Commit

Permalink
add explanation
Browse files Browse the repository at this point in the history
  • Loading branch information
alexdunnjpl committed Nov 26, 2024
1 parent 1b00788 commit 059a5df
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions src/pds/registrysweepers/reindexer/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,12 @@ def get_docs_query(filter_to_harvested_before: datetime):
"""
Return a query to get all docs which haven't been reindexed by this sweeper and which haven't been harvested
since this sweeper process instance started running
i.e.
- Query all documents
- Exclude anything which has already been processed, to avoid redundant reprocessing
- Exclude anything which was harvested in the middle of this sweeper running, since this can cause erroneous results
due to inconsistency in the document set across query calls which are expected to be identical.
"""
# TODO: Remove this once query_registry_db_with_search_after is modified to remove mutation side-effects
return {
Expand Down

0 comments on commit 059a5df

Please sign in to comment.