Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ Containers are also removed via agent when the dynamic-sidecar is stopped (⚠️ devops) #6924

Merged

Conversation

GitHK
Copy link
Contributor

@GitHK GitHK commented Dec 9, 2024

devops ⚠️

Additional steps are required when releasing:

  1. make sure all sidecars were removed
  2. go to Portainer -> Containers
  3. look for containers that are part of service, search for the following dy-sidecar-
  4. remove all of the above containers (regardless of their state running/created/stopped, etc...)

Procedure was applied to

Master:

  • master internal
  • master AWS

Staging:

  • staging DALCO
  • staging AWS

Prod:

  • prod DALCO
  • prod osparc.io
  • prod s4l
  • prod tip internal
  • prod tip AWS

What do these changes do?

When cleaning up all resources used by a new style dynamic service, the director-v2 will now also ask the agent to remove all possible left over containers from the latest run of the service.
Agent searches for all possible containers with a certain prefix that identify a proxy, sidecar or user service for a given node_id. If any container is found, it is removed.

Related issue/s

How to test

Dev-ops checklist

@GitHK GitHK self-assigned this Dec 9, 2024
Copy link

codecov bot commented Dec 9, 2024

Codecov Report

Attention: Patch coverage is 82.72727% with 19 lines in your changes missing coverage. Please review.

Project coverage is 86.92%. Comparing base (1ce9f08) to head (e452ef4).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6924      +/-   ##
==========================================
- Coverage   87.11%   86.92%   -0.19%     
==========================================
  Files        1608     1441     -167     
  Lines       63507    59803    -3704     
  Branches     2024     1635     -389     
==========================================
- Hits        55322    51985    -3337     
+ Misses       7851     7547     -304     
+ Partials      334      271      -63     
Flag Coverage Δ
integrationtests 64.87% <100.00%> (-0.05%) ⬇️
unittests 85.11% <82.72%> (-0.69%) ⬇️
Components Coverage Δ
api ∅ <ø> (∅)
pkg_aws_library ∅ <ø> (∅)
pkg_dask_task_models_library ∅ <ø> (∅)
pkg_models_library 91.36% <100.00%> (+<0.01%) ⬆️
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration 70.02% <ø> (ø)
pkg_service_library 74.29% <0.00%> (-0.24%) ⬇️
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 85.38% <ø> (ø)
agent 96.82% <96.05%> (-0.19%) ⬇️
api_server 90.13% <ø> (ø)
autoscaling 96.09% <ø> (ø)
catalog 90.57% <ø> (ø)
clusters_keeper 99.48% <ø> (ø)
dask_sidecar 91.26% <ø> (ø)
datcore_adapter 93.18% <ø> (ø)
director 76.40% <ø> (ø)
director_v2 91.41% <100.00%> (+<0.01%) ⬆️
dynamic_scheduler 97.03% <ø> (ø)
dynamic_sidecar 89.75% <ø> (ø)
efs_guardian 90.12% <ø> (ø)
invitations 93.44% <ø> (ø)
osparc_gateway_server ∅ <ø> (∅)
payments 92.66% <ø> (ø)
resource_usage_tracker 89.65% <ø> (+0.06%) ⬆️
storage 89.54% <ø> (ø)
webclient ∅ <ø> (∅)
webserver 84.38% <ø> (-0.11%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1ce9f08...e452ef4. Read the comment docs.

@GitHK GitHK added this to the Event Horizon milestone Dec 9, 2024
@GitHK GitHK changed the title ♻️ Containers are also removed via agent when the dynamic-sidecar is stopped ♻️ Containers are also removed via agent when the dynamic-sidecar is stopped (⚠️ devops) Dec 9, 2024
@GitHK GitHK marked this pull request as ready for review December 11, 2024 07:39
@GitHK GitHK added a:agent agent service a:director-v2 issue related with the director-v2 service labels Dec 11, 2024
Copy link
Member

@mrnicegyu11 mrnicegyu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks looks good, i have some minor questions. full disclosure: I only read the code not the tests.

@GitHK GitHK requested a review from mrnicegyu11 December 11, 2024 10:37
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure I understand that change correctly.

So now you will call from the dv-2 on all the agents to remove containers with some UUID correct?

  • when is this call done exactly?
  • will it have any incidence if the call fails in one agent? like a returned exception? will this stop something in the dv-2 from running correctly?
  • will this have an influence on performance? when I start the service anew? (for example in auto-scaled deployments, most probably the dangling container is gone with the machine)

Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

@GitHK
Copy link
Contributor Author

GitHK commented Dec 13, 2024

Just to be sure I understand that change correctly.

So now you will call from the dv-2 on all the agents to remove containers with some UUID correct?

  • when is this call done exactly?
  • will it have any incidence if the call fails in one agent? like a returned exception? will this stop something in the dv-2 from running correctly?
  • will this have an influence on performance? when I start the service anew? (for example in auto-scaled deployments, most probably the dangling container is gone with the machine)

Not precisely. This works as follows:

  • to remove a container we call a specific agent (the one that is running on the node where the service was started) and ask it to remove the service.
  • the above is achieved by injecting inside the RPC method name the docker_node_id
  • in case of issues an error will be raised (could not reach agent, timeout etc...)
  • there are no influences on performance. (this is also ran when closing the service)

Copy link
Member

@mrnicegyu11 mrnicegyu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We finally resolved the disagreements in person, thanks a lot and good for me :--)

@GitHK GitHK requested review from sanderegg and pcrespov December 16, 2024 08:40
@GitHK GitHK enabled auto-merge (squash) December 16, 2024 13:42
Andrei Neagu added 3 commits December 16, 2024 16:02
@pcrespov pcrespov disabled auto-merge December 17, 2024 09:22
@pcrespov pcrespov merged commit 75aed81 into ITISFoundation:master Dec 17, 2024
87 of 91 checks passed
@GitHK GitHK deleted the pr-osparc-orphaned-containers-removal branch December 17, 2024 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:agent agent service a:director-v2 issue related with the director-v2 service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants