Skip to content

Commit

Permalink
Merge pull request #367 from openzim/sort_folder_mtime
Browse files Browse the repository at this point in the history
Sort WARC directories passed to zimit by modification time
  • Loading branch information
benoit74 authored Aug 9, 2024
2 parents 0d5a08c + eb32adf commit a0f8020
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 2 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Stop fetching and passing browsertrix crawler version as scraperSuffix to warc2zim (#354)
- Do not log number of WARC files found (#357)

### Fixed

- Sort WARC directories found by modification time (#366)

## [2.0.6] - 2024-08-02

### Changed
Expand Down
8 changes: 6 additions & 2 deletions src/zimit/zimit.py
Original file line number Diff line number Diff line change
Expand Up @@ -586,14 +586,18 @@ def cleanup():
]

else:
warc_dirs = list(temp_root_dir.rglob("collections/crawl-*/archive/"))
warc_dirs = sorted(
temp_root_dir.rglob("collections/crawl-*/archive/"),
key=lambda path: path.lstat().st_mtime,
)
if len(warc_dirs) == 0:
raise RuntimeError(
"Failed to find directory where WARC files have been created"
)
elif len(warc_dirs) > 1:
logger.info(
"Found many WARC files directories, only last one will be used"
"Found many WARC files directories, only most recently modified one"
" will be used"
)
for directory in warc_dirs:
logger.info(f"- {directory}")
Expand Down

0 comments on commit a0f8020

Please sign in to comment.