Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making it possible to recursively clean files in sub-directories #173

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

adybbroe
Copy link
Contributor

@adybbroe adybbroe commented May 30, 2023

Refactor the files cleaning a little, and making it possible to recursively clean sub-directories

At SMHI we are using the remove_it.py script to clean up after CSPP SDR processing for instance. However, CSPP leaves behind a deep tree of files, something like 3-4 GB, and these are not needed after all granules have been processed (they are however, probably/likely needed while SDR granule processing is going on). The current remove_it.py script only does a file-glob and does not walk down into the tree, thus it requires something like the list of templates below in order to clean properly (and this one was not enough in our case):

[cspp_viirs_work]
base_dir=/san1/polar_in/direct_readout/jpss/viirs/sdr/
templates=noaa2?_????????_????_?????/J0*/*,noaa2?_????????_????_?????/J0*/ProSdr*/*,noaa2?_????????_????_?????/db/DmSm/*/*,noaa2?_????????_????_?????/db/DmSm/*,noaa2?_????????_????_?????/db/*,noaa2?_????????_????_?????/log/*,noaa2?_????????_????_?????/log/*,noaa2?_????????_????_?????/logs/*,noaa2?_????????_????_?????/msd/*,noaa2?_????????_????_?????/dynamic/*,npp_????????_????_?????/NPP*/*,npp_????????_????_?????/NPP*/ProSdr*/*,npp_????????_????_?????/db/DmSm/*/*,npp_????????_????_?????/db/DmSm/*,npp_????????_????_?????/db/*,npp_????????_????_?????/log/*,npp_????????_????_?????/log/*,npp_????????_????_?????/logs/*,npp_????????_????_?????/msd/*,npp_????????_????_?????/dynamic/*
hours=8

Instead I would like to be able to do something like this instead:

[viirs_lvl1]
base_dir=/data/polar_in/direct_readout/jpss/viirs/sdr/
templates=noaa2?_????????_????_?????/*,npp_????????_????_?????/*
filetime_checker_type=mtime
recursive=true
hours=8
  • Closes #xxxx
  • Tests added
  • Fully documented

@adybbroe adybbroe requested review from mraspaud, pnuu and TAlonglong May 30, 2023 11:05
@adybbroe adybbroe self-assigned this May 30, 2023
Adam.Dybbroe added 4 commits May 30, 2023 13:09
@codecov
Copy link

codecov bot commented Jun 1, 2023

Codecov Report

Merging #173 (6d9b4bd) into main (47dd25f) will decrease coverage by 0.52%.
The diff coverage is 74.57%.

@@            Coverage Diff             @@
##             main     #173      +/-   ##
==========================================
- Coverage   90.00%   89.48%   -0.52%     
==========================================
  Files          24       26       +2     
  Lines        5100     5277     +177     
==========================================
+ Hits         4590     4722     +132     
- Misses        510      555      +45     
Flag Coverage Δ
unittests 89.48% <74.57%> (-0.52%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
trollmoves/filescleaner.py 55.88% <55.88%> (ø)
trollmoves/tests/test_filescleaner.py 100.00% <100.00%> (ø)

@adybbroe adybbroe marked this pull request as ready for review June 1, 2023 13:36
@adybbroe
Copy link
Contributor Author

adybbroe commented Jun 1, 2023

You can have a look, and see what you think @TAlonglong @pnuu and @mraspaud
But I felt tempted to refactor a bit further, reducing the number of arguments in the various function calls in filescleaner.py and warpping them in a class....

Copy link
Member

@pnuu pnuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some change requests inline.

Comment on lines +135 to +140
if args.verbose:
logging.basicConfig(level=logging.DEBUG, handlers=[handler], format=msgformat)
elif args.quiet:
logging.basicConfig(level=logging.DEBUG, handlers=[handler], format=msgformat)
else:
logging.basicConfig(level=logging.DEBUG, handlers=[handler], format=msgformat)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all equal, should they have levels DEBUG/WARNING/INFO (or even ERROR for the quiet)?

try:
if os.path.isdir(filename):
if not os.listdir(filename):
os.rmdir(filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directory deletion doesn't need publishing?

Comment on lines +76 to +83
if len(files_in_dir) == 0:
if is_dry_run:
LOGGER.info("Would remove empty directory: %s", dirpath)
else:
try:
os.rmdir(dirpath)
except OSError:
LOGGER.debug("Was trying to remove empty directory, but failed. Should not have come here!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract to new _remove_empty_directory() function.



def clean_files_and_dirs(pub, filepaths, ref_time, stat_time_checker, is_dry_run):
"""From a list of file paths and a reference time clean files and directories."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring should be in imperative mood.

Suggested change
"""From a list of file paths and a reference time clean files and directories."""
"""Clean files and directories defined by a list of file paths and a reference time."""

Comment on lines +108 to +117
was_removed = False
if not is_dry_run:
was_removed = remove_file(filepath, pub)
else:
# print("Would remove %s" % filepath)
LOGGER.info("Would remove %s" % filepath)

if was_removed:
section_files += 1
section_size += stat.st_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra branch can be removed by rearranging the code.

Suggested change
was_removed = False
if not is_dry_run:
was_removed = remove_file(filepath, pub)
else:
# print("Would remove %s" % filepath)
LOGGER.info("Would remove %s" % filepath)
if was_removed:
section_files += 1
section_size += stat.st_size
if not is_dry_run:
was_removed = remove_file(filepath, pub)
section_files += 1
section_size += stat.st_size
else:
LOGGER.info("Would remove %s" % filepath)

Comment on lines +130 to +134
recursive = info.get('recursive')
if recursive and recursive == 'true':
recursive = True
else:
recursive = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
recursive = info.get('recursive')
if recursive and recursive == 'true':
recursive = True
else:
recursive = False
recursive = info.getboolean('recursive', False)

Comment on lines +143 to +150
kws = {}
for key in ["days", "hours", "minutes", "seconds"]:
try:
kws[key] = int(info[key])
except KeyError:
pass

ref_time = datetime.utcnow() - timedelta(**kws)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a separate function, _get_reference_time() for example.



def test_clean_dir_non_recursive(fake_tree_of_some_files, tmp_path, caplog):
"""Test cleaning a directory for files of a certain age."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring doesn't match the actual test (nor the test function name).

Comment on lines +95 to +98
"""Test cleaning a directory tree for files of a certain age.

Here we test using the modification time to determine when the file has been 'created'.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Test cleaning a directory tree for files of a certain age.
Here we test using the modification time to determine when the file has been 'created'.
"""
"""Test cleaning a directory tree for files which were created before the given time."""

Comment on lines +122 to +125
"""Test cleaning a directory tree for files of a certain age.

Here we test using the modification time to determine when the file has been 'created'.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Test cleaning a directory tree for files of a certain age.
Here we test using the modification time to determine when the file has been 'created'.
"""
"""Test cleaning a directory tree for files that have not been modified after the given time."""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants