Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weekly build triage - Automation #3521

Closed
10 tasks done
adamfarley opened this issue Nov 3, 2023 · 4 comments · Fixed by #3576
Closed
10 tasks done

Weekly build triage - Automation #3521

adamfarley opened this issue Nov 3, 2023 · 4 comments · Fixed by #3576
Assignees
Labels
testing Issues that enhance or fix our test suites weekly-build-triage

Comments

@adamfarley
Copy link
Contributor

adamfarley commented Nov 3, 2023

Summary

To automate the process of build failure triage.

Note, this does include propagated red test framework failures, but will not include unit test failure.

MVP

Steps 1-3, as per the full plan below.

Full Plan

  • Step 1) A GitHub Action calls a script on Mondays.
  • Step 2) The script identifies the latest automated build pipelines, limited to the last 7 days, for each current JDK major version.
  • Step 3) The script then identifies which subjobs failed.
  • Step 4) We then access each failed subjob and use a list of easily-modified patterns to identify useful output relevant to the failure.
    • 3-4ph: Draw up list of initial regexs and any (rough) relevant data.
    • 1-4ph: Test regexs and fix bugs.
    • 1pd: Approximate Total.
    • Actual time: 1.6pd
      • Note: Creating the logic to actually apply the regexs took longer than expected. Make sure to set aside time for supporting logic in future estimations.
  • Step 5) Each pattern should be associated with metadata, including problem type, advice, any relevant issue links, and whether it is generally considered a preventable issue.
    • 2-3ph: Tidy metadata and identify preventables.
    • 1-2ph: Test metadata displays correctly.
    • 0.5pd: Approximate Total.
    • Actual time: 0.3pd
  • Step 6) A summary is formatted into a table like the one at the top here.
    • 2-4ph: Write initial code to generate md table.
    • 1-3ph: Fix bugs.
    • 1ph: Output table to file.
    • 1-4ph: Test output works and is accurate.
    • 0.5-2pd: Approximate Total.
    • Actual time: 0.2pd
      • Note: The table doesn't seem as useful as a simple number, which is what the community asked for upstream.
  • Step 7) All of the information (the link to the job, the useful output, and the metadata) then gets put after the table, in chunks.
    • 1-3ph: Write code and test. This should be simple.
    • Actual time: 1.5ph: It was, barring a single bug that's been fixed.
  • Step 8) The summary and the full info list are then output from the script, where the GitHub Action collects them.
  • Step 9) The action finally creates a build triage issue in the same repository, with the script output as the issue's contents.
  • Step 10) Cake.
    • Cancelled this for diet's sake. 😞
@github-actions github-actions bot added the testing Issues that enhance or fix our test suites label Nov 3, 2023
@adamfarley adamfarley self-assigned this Nov 3, 2023
@adamfarley
Copy link
Contributor Author

adamfarley commented Nov 20, 2023

Step 2 and 3 are complete and being tested. Code here: https://github.com/adamfarley/temurin-build/tree/build_triage_automation

Will likely spend the rest of the day adding regexs and plugging all this into an issue-generating github action.

@adamfarley
Copy link
Contributor Author

adamfarley commented Nov 29, 2023

Steps 1, 8, and 9 were tricky, but are now complete.

A PR for this issue's MVP commit can be found here.

For future improvements, I will be including a link to the branch used for said improvements. The commit messages on these branches can be used to gauge incremental status.

@adamfarley
Copy link
Contributor Author

adamfarley commented Nov 29, 2023

A minimal regex list, complete with metadata and logic, will be created here.

This fulfils steps 4 and 5.

@adamfarley
Copy link
Contributor Author

adamfarley commented Dec 14, 2023

Ok, the script is finished. I've created the documentation, updated the issue description field with the actual time it took (to refine estimates in the future), and created a PR to merge the changes: #3576

Further improvements to the build triage script will happen as time allows, and proposals/plans/results will be tracked in a new issue. (Link).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Issues that enhance or fix our test suites weekly-build-triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant