Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Replace unarchiving function #6959

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

GitHK
Copy link
Contributor

@GitHK GitHK commented Dec 12, 2024

What do these changes do?

When a new style dynamic service has to fetch an input port containing a zip archive, it will try to unzip it.
If it's not able to open it (see issue here), the service would fail to start.

To fix the issue the unarchiving method is replaced with 7zip. The function signature was not changed for the purpose of making it compatible with the previously used version.

Related issue/s

How to test

Dev-ops checklist

@GitHK GitHK self-assigned this Dec 12, 2024
@GitHK GitHK added the a:dynamic-sidecar dynamic-sidecar service label Dec 12, 2024
@GitHK GitHK added bug buggy, it does not work as expected t:maintenance Some planned maintenance work and removed bug buggy, it does not work as expected labels Dec 12, 2024
Copy link

codecov bot commented Dec 12, 2024

Codecov Report

Attention: Patch coverage is 64.06250% with 23 lines in your changes missing coverage. Please review.

Project coverage is 68.13%. Comparing base (95d208b) to head (9d67dd1).

❗ There is a different number of reports uploaded between BASE (95d208b) and HEAD (9d67dd1). Click for more details.

HEAD has 29 uploads less than BASE
Flag BASE (95d208b) HEAD (9d67dd1)
unittests 30 1
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #6959       +/-   ##
===========================================
- Coverage   88.11%   68.13%   -19.99%     
===========================================
  Files        1592      633      -959     
  Lines       62337    30659    -31678     
  Branches     2012      262     -1750     
===========================================
- Hits        54928    20888    -34040     
- Misses       7074     9711     +2637     
+ Partials      335       60      -275     
Flag Coverage Δ
integrationtests 64.95% <26.56%> (-0.03%) ⬇️
unittests 88.88% <64.06%> (+2.54%) ⬆️
Components Coverage Δ
api ∅ <ø> (∅)
pkg_aws_library ∅ <ø> (∅)
pkg_dask_task_models_library ∅ <ø> (∅)
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library ∅ <ø> (∅)
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 77.37% <ø> (-8.02%) ⬇️
agent ∅ <ø> (∅)
api_server ∅ <ø> (∅)
autoscaling ∅ <ø> (∅)
catalog ∅ <ø> (∅)
clusters_keeper ∅ <ø> (∅)
dask_sidecar ∅ <ø> (∅)
datcore_adapter ∅ <ø> (∅)
director ∅ <ø> (∅)
director_v2 78.81% <ø> (-12.60%) ⬇️
dynamic_scheduler ∅ <ø> (∅)
dynamic_sidecar 89.82% <64.06%> (+0.06%) ⬆️
efs_guardian ∅ <ø> (∅)
invitations ∅ <ø> (∅)
osparc_gateway_server ∅ <ø> (∅)
payments ∅ <ø> (∅)
resource_usage_tracker ∅ <ø> (∅)
storage ∅ <ø> (∅)
webclient ∅ <ø> (∅)
webserver 59.60% <ø> (-28.14%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95d208b...9d67dd1. Read the comment docs.

@GitHK GitHK marked this pull request as ready for review December 13, 2024 05:47
Copy link
Member

@odeimaiz odeimaiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the idea behing this change?

Also I was wondering:

  • the dask-sidecar checks the input file mime type. if it is set to zip, then it does not uncompress

@GitHK GitHK requested a review from sanderegg December 13, 2024 10:15
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please read my additional comments. I don't undersatnd how you wanna present this error to the user.

@GitHK GitHK changed the title 🐛 Move archive to port instead of failing 🐛 Replace unarchiving method Dec 16, 2024
@GitHK GitHK changed the title 🐛 Replace unarchiving method 🐛 Replace unarchiving function Dec 16, 2024
Copy link

sonarcloud bot commented Dec 16, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarQube Cloud

@GitHK GitHK requested review from sanderegg and pcrespov December 16, 2024 13:50
mkdir -p /tmp/7zip
cd /tmp/7zip

curl -LO https://www.7-zip.org/a/7z${SEVEN_ZIP_VERSION}-linux-x64.tar.xz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: why not to have our own image with this already in place instead of downloading and installing it every time?


file_count = await _get_file_count(zip_path)

command = f"7z x {zip_path} -o{output_dir} -bb1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be careful not to end up with segmented paths in the CLI ...

Normally subprocess api allows you to pass the command as a list of arguments. Here I read that it has to be a str. So perhaps the way to go is to add quotes around the paths? (double check)

Something like

command = f"7z x \"{zip_path}\" -o \"{output_dir}\" -bb1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and long option names? 'x' '-o' --> '--output' ?
and it is indeed usually better to use the list of strings if possible

assert process.stderr # nosec

async with progress_bar.sub_progress(
steps=file_count, description=IDStr("...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the first time I see using IDStr for descriptions ...
NOTE that:

  1. Calling IDStr("...") is equivalent to calling str("..."). Therefore in this context: it basically does nothing. To perform a validation you need to create a TypeAdapter(IDStgr).validate_python("...")` etc
  2. If you want to annotate a field . Then note that IDStr is intended for annotation of String IDentifiers (e.g. a prj_123456) and not descriptions! There are other constraints for e.g. ShortTruncatedStr or LongTruncatedStr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcrespov I think this is a flaw in the ProgressBar class. I guess I did it wrong at the time. but indeed I would just remove that IDStr from there it makes no sense. But probably should go in a separate PR, maybe an issue for this would be nice.



@pytest.fixture
def to_archive_dir(tmpdir: LocalPath) -> Path:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is like a deja vu from to year one's reviews :-)

We use everywhere the tmp_path version of this fixture which produces pathlib.Path instead of tmpdir which produces a custom version of pytest LocalPath

Please read https://docs.pytest.org/en/6.2.x/tmpdir.html#the-tmp-path-fixture and use tmp_path: Path instead of tmpdir: LocalPath

try:
subprocess.check_output(["7z", "--help"]) # noqa: S607, S603
except Exception: # pylint: disable=broad-except
pytest.skip("7z is not installed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to add in the dynamic-sidecar Makefile some recipe to install this. Perhaps add it in the skip message or simply make a reference to ./ci/github/helpers/install_7zip.bash

Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the end we replace the zip library just for unzipping correct?
Why not replacing the zip library for unzipping completely instead?
How are they comparable performance wise?

@@ -1304,6 +1304,8 @@ jobs:
cache-dependency-glob: "**/dynamic-sidecar/requirements/ci.txt"
- name: show system version
run: ./ci/helpers/show_system_versions.bash
- name: install 7zip
run: sudo ./ci/github/helpers/install_7zip.bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this done here and not in the dynamic-sidecar.bash script?



class SevenZipError(BaseDynamicSidecarError):
msg_template = "Could not finish command: '{command}'\nReason: {command_result}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg_template = "Could not finish command: '{command}'\nReason: {command_result}"
msg_template = "Could not complete 7zip command: '{command}'\nReason: {command_result}"

@@ -36,6 +36,7 @@

from ..core.settings import ApplicationSettings, get_settings
from ..modules.notifications import PortNotifier
from .seven_zip_wrapper import unarchive_zip_to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it not make sense to put that one also in servicelib?
Also if you ensure compatible interfaces, that would even call for a common interface?

@@ -95,7 +96,7 @@ async def upload_outputs( # pylint:disable=too-many-statements # noqa: PLR0915
port_notifier: PortNotifier,
) -> None:
# pylint: disable=too-many-branches
logger.debug("uploading data to simcore...")
_logger.debug("uploading data to simcore...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this kind of logs, I guess you could use the log decorator and spare 1 line of code

Comment on lines +247 to +249
async def _move_file_to_input_port(
final_path: Path, downloaded_file: Path, dest_folder: PrunableFolder
) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function named _move_file_to_input_port takes arguments that do not have any port in their name. even the function does not check anything port related...
actually I don't even really understand what dest_folder is? was that the original folder where the downloaded_file is living?

so this funciton

  • moves downloaded_file into final_path
  • then prunes dest_folder exluding final_path
    maybe a bit of refactoring is in order? at least with naming the arguments?

Comment on lines +53 to +54
if line_decoded.startswith("- "): # check file entry
await sub_prog.update(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohla.. hope that is tested



@pytest.fixture
def internal_tools_unarchived_tools(tmpdir: LocalPath) -> Path:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very weird name...

Comment on lines +56 to +63
def generate_content(
to_archive_dir: Path, sub_dirs: int, files_in_subdirs: int
) -> None:
for i in range(sub_dirs):
(to_archive_dir / f"s{i}").mkdir(parents=True, exist_ok=True)
for k in range(files_in_subdirs):
(to_archive_dir / f"s{i}" / f"{k}.txt").write_text("a" * k)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we have already such functions in /home/sanderegg/dev/github/osparc-simcore/packages/pytest-simcore/src/pytest_simcore/file_extra.py ?

print(f"{entry}")


def _strip_folder_from_path(paths: set[Path], *, to_strip: Path) -> set[Path]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to really see the added value of a function that calls a one liner ;) but ok

pytest.param(5, 4, id="few_items"),
],
)
async def test_ensure_same_interface_as_unarchive_dir(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use a common interface and not have to test that? but just run the usual archive/unarchive tests with standard zip and 7zip?

@GitHK GitHK added this to the Event Horizon milestone Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:dynamic-sidecar dynamic-sidecar service t:maintenance Some planned maintenance work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants