Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚀 Release v1.63.0 #5027

Closed
28 of 29 tasks
matusdrobuliak66 opened this issue Nov 14, 2023 · 8 comments
Closed
28 of 29 tasks

🚀 Release v1.63.0 #5027

matusdrobuliak66 opened this issue Nov 14, 2023 · 8 comments
Assignees
Labels
release Preparation for pre-release/release t:maintenance Some planned maintenance work
Milestone

Comments

@matusdrobuliak66
Copy link
Contributor

matusdrobuliak66 commented Nov 14, 2023

Release version

1.63.0

Commit SHA

5ad8c463893e2ddbd348d390e7e109450769ba82

Previous pre-release

https://github.com/ITISFoundation/osparc-simcore/releases/tag/staging_sevenPeaks3

Did the commit CI suceeded?

  • The commit CI succeeded.

Changes

What's Changed

Devops check 👷

  • Force deploy graylog and check that it is configured. If it goes good, remove configure_graylog pipeline stage
  • Check monitoring, registry and portainer services. Make sure introduced resource limits do not break anything
  • Run ansible for in-house deployments to introduce rsync copy backup cron job (AND to remove sshfs related cron jobs)
  • Special action: Prometheus federation on osparc.io https://git.speag.com/oSparc/osparc-ops-deployment-configuration/-/merge_requests/175
  • Restart PG-Backup (manually)
  • Check that dynamic-sidecar and computational backend sidecar image uses proper image tag @sanderegg

Tests assessment: e2e testing check 🧪

No response

Test assessment: targeted-testing 🔍️

No response

Test assessment: user-testing 🧐

No response

Summary 📝

  • Prepare release link
make release-prod version=1.63.0  git_sha=5ad8c463893e2ddbd348d390e7e109450769ba82
  • Draft release changelog
  • Announce maintenance ( ** ANNOUNCE AT LEAST 24 HOURS BEFORE ** )
  • redis {"start": "2023-03-06T13:00:00.000Z", "end": "2023-03-06T15:00:00.000Z", "reason": "Release <vX.X.0>"}
    • aws
    • dalco
    • tip
  • status page (https://manage.statuspage.io/)
    • osparc
    • s4l
  • mattermost channels
    • maintenance
    • power users

Releasing 🚀

  • Maintenance page up.
cd /deployment/production/osparc-ops-environments
make up-maintenance
make down-maintenance
  • Release by publishing draft
  • Check release CI
  • Check hanging sidecars. Helper command to run in director-v2 CLI simcore-service-director-v2 close-and-save-service <uuid>
  • Check deployed
    • aws deploy
    • dalco deploy
    • tip deploy
  • Check testing endpoint ex. https://testing.osparc.speag.com/
  • Delete announcement
  • Check e2e runs
  • Announce
:tada: https://github.com/ITISFoundation/osparc-simcore/releases/tag/v<M.m.0>
@matusdrobuliak66 matusdrobuliak66 added t:maintenance Some planned maintenance work release Preparation for pre-release/release labels Nov 14, 2023
@matusdrobuliak66 matusdrobuliak66 added this to the 7peaks milestone Nov 14, 2023
@mrnicegyu11
Copy link
Member

mrnicegyu11 commented Nov 22, 2023

Since the federated prometheus will probably be rolled out with this release, consider taking some times to do this ITISFoundation/osparc-ops-environments#174, which requires backfilling the new rules at least on the osparc.io prometheus-catchall (see https://jessicagreben.medium.com/prometheus-fill-in-data-for-new-recording-rules-30a14ccb8467 )

i added a devops task to the main item of the ticket

@GitHK
Copy link
Contributor

GitHK commented Nov 27, 2023

Issues during release:

  • migration is a "false positive" it says it was released but the migration did not go though (we had to manually reboot the service)
  • webserver still used release-latest tag instead of v1.63.0 docker image rollback - todo investigate why
  • catalog still used release-latest tag instead of v1.63.0 docker image rollback - todo investigate why

@GitHK
Copy link
Contributor

GitHK commented Nov 27, 2023

When study is open the frontend no longer shows any erro when the a study is already opened.

Reply from POST v0/projects/e131863c-8d02-11ee-818c-02420a0b0331:open

{
    "data": null,
    "error": {
        "logs": [
            {
                "message": "You cannot open more than 1 study at once. Please close another study and retry.",
                "level": "ERROR",
                "logger": "user"
            }
        ],
        "errors": [
            {
                "code": "HTTPConflict",
                "message": "You cannot open more than 1 study at once. Please close another study and retry.",
                "resource": null,
                "field": null
            }
        ],
        "status": 409,
        "message": "You cannot open more than 1 study at once. Please close another study and retry."
    }
}

@YuryHrytsuk
Copy link
Contributor

YuryHrytsuk commented Nov 27, 2023

Dalco prod:

@YuryHrytsuk
Copy link
Contributor

YuryHrytsuk commented Nov 27, 2023

AWS PROD:

  • not all monitoring services have all limits set (in the code the limits are set though)
  • Autoscaling ENV is misconfigured
    • The service was rolled back. Then I manually updated image tag and forced Autoscaling to run on a new Tag however the ENV was still for the previous version.

@YuryHrytsuk
Copy link
Contributor

Dalco prod pipeline is fixed by https://git.speag.com/oSparc/osparc-ops-deployment-configuration/-/merge_requests/180

Dalco prod:

* release pipeline is broken https://git.speag.com/oSparc/osparc-ops-deployment-configuration/-/jobs/3893834

* postgres got broken. It is a known issue: https://git.speag.com/oSparc/osparc-infra/-/issues/incident/12

@YuryHrytsuk
Copy link
Contributor

AWS PROD:

* not all monitoring services have all limits set (in the code the limits are set though)

* Autoscaling ENV is misconfigured
  
  * The service was rolled back. Then I manually updated image tag and forced Autoscaling to run on a new Tag however the ENV was still for the previous version.

Autoscaling env fixed manually by @sanderegg (thank you!). We need to reconsider how we handle env updates. It has potential problems.

@YuryHrytsuk
Copy link
Contributor

AWS PROD:

* not all monitoring services have all limits set (in the code the limits are set though)

* Autoscaling ENV is misconfigured
  
  * The service was rolled back. Then I manually updated image tag and forced Autoscaling to run on a new Tag however the ENV was still for the previous version.

All monitoring services actually have limits. I forgot to deploy the monitoring stack so it didn't get updated. After the deploy, all resource limits are in place 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release Preparation for pre-release/release t:maintenance Some planned maintenance work
Projects
None yet
Development

No branches or pull requests

6 participants