-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send email if reproducible built fails in the CI #7897
base: master
Are you sure you want to change the base?
Send email if reproducible built fails in the CI #7897
Conversation
Hi @s373nZ, Please review this PR which adds the functionality to send an email notification if the CI fails during any reproducible build step. I have also updated the folder location for this script from I am also considering merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally receive an automatic email directly from Github whenever any CI I triggered fails, so I was a little curious around the circumstances regarding the requirement to send emails and did some digging. My guess is that the team is:
- drowning in CI failure emails and most of the notifications are ignored or filtered
- unclear who is receiving the notifications for scheduled workflows like the nightly repro builds
I found this documentation on workflow runs which states:
Notifications for scheduled workflows are sent to the user who initially created the workflow. If a different user updates the cron syntax in the workflow file, subsequent notifications will be sent to that user instead. If a scheduled workflow is disabled and then re-enabled, notifications will be sent to the user who re-enabled the workflow rather than the user who last modified the cron syntax.
Disabling and re-enabling the scheduled job (if you have permissions) or committing to modify the cron
syntax could shift the automatic notifications to you. These seem a little flaky, so the requirement to have a solid solution captured in code is understandable, and for others in the community as well. Maybe Slack notifications could be an interesting alternative?
That said, your current approach looks pretty good to me. I would consider trying to consolidate the three different action-send-email
steps into one by making it a completely separate job, something like:
jobs:
ubuntu: [...]
failure-notify:
needs: ubuntu
if: failure()
steps:
uses: dawidd6/action-send-mail@v3
...
It might not work, esp. with the matrix build, but it could be worth a shot. Inspiration here.
I have also updated the folder location for this script from /release to /repro just to avoid confusion with the release.yml/build-release.sh scripts.
Good idea! My first thought was to suggest changing cl-repro
to cl-release
in release.yml as well, but we have a logical dependency in build-release.sh.
I am also considering merging repro.yml with release.yml in the future, because repro.yml serves as a pre-stage for release.yml. I would appreciate your thoughts on that too.
We could try to reuse the repro.yml steps using a reusable workflow or a composite action. I considered trying to do this at the outset of the release automation work, but I think it belongs in a separate PR.
Unless you are suggesting to do away with the nightly builds in favor of detecting dirty builds only during the release process. I think there is value in early nightly detection so the release captain isn't tasked with too much triage at the last minute, but merging the two workflows is reasonable and should be possible.
Overall, LGTM pending your feedback re: the email step consolidation and the SMTP config.
2c84244
to
d1ea50e
Compare
Yes, the goal is to send a customised email so that it stands out from the others and is not overlooked.
Thanks for pushing me to avoid being lazy 😄! I was not happy about repeating the step, but wanted to capture the details of the failed step as well. It took a little time, but the email and workflow are much cleaner now. I ended up merging them into a single step at the end.
I would prefer to keep everything in one workflow. I plan to run the repro step on a scheduled basis, while the other steps (including the repro) will execute when a tag is pushed. |
@ShahanaFarooqui It just occurred to me that this line (in all the Ubuntu Dockerfiles) might cause a problem with changing the folder location to
Since the files are reused in both the repro build process and the release process, it might cause an error. |
|
That lightning/.github/workflows/repro.yml Line 40 in 2791c60
That line is copying the repro build archive to the lightning/.github/workflows/repro.yml Line 62 in 2791c60
Also, IIRC I needed to create the One initial idea to get around this might be to try adding an Hope this makes sense. LMK if it doesn't, or I'm missing something. I can help or chime back in tomorrow. |
cc4fbaf
to
507008f
Compare
The action is still successfully completing with the I also added the step |
507008f
to
ef485cf
Compare
@ShahanaFarooqui I'm curious how it completed successfully w/o the IMHO, I think the appropriate solution is to leave task commands as they were previously so we still get the log output in the UI, and just have a simpler email with less context that reports there was an error and provides a link to the CI output. The recipients can view the log output from Github actions in the case of an error, and others who are not on the |
ef485cf
to
f28587b
Compare
Agree, I removed the error capturing code as it was causing more complexity than it was worth. I also tried using the Posting the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! LGTM :)
ACK f28587b
One final observation - it looks like if all three repro builds failed, then three separate emails would be sent because the failure email step is a part of the matrix job, right? You could try to consolidate it down into one email by making that step a separate If maybe having multiple emails per run failure is fine for you, the current code still LGTM 👍 |
For now, I would prefer to keep them separate since I am the only one on the distribution list :D. I will consider merging them if their frequency seems unnecessarily high.
Capturing the failing step name should not be a big issue, as we can set it as output instead. |
Changelog-None.
f28587b
to
bb72e59
Compare
Changelog-None.