Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windocker: sign_verification jobs failing for windows docker build pipelines #3709

Closed
sxa opened this issue Aug 14, 2024 · 24 comments · Fixed by adoptium/ci-jenkins-pipelines#1117
Assignees
Labels
docker os:windows secure-dev Issues specific to SSDF/SLSA compliance work

Comments

@sxa
Copy link
Member

sxa commented Aug 14, 2024

The sign job appears to be completing successfully when called from the windbld job, but the verification is failing. @steelhead31 has checked and there appears to be a valid Eclipse Signature on some of the files showing as failing in the logs.

I've locked job https://ci.adoptium.net/job/build-scripts/job/release/job/sign_verification/1588 to retain the output for now. That was started from https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/windbld/110

@sxa sxa changed the title sign_verification jobs failing for windows docker build pipelines windocker: sign_verification jobs failing for windows docker build pipelines Aug 14, 2024
@adamfarley
Copy link
Contributor

It does look like this PR's merge coincided with the first instance of the "three signatures" issue Scott mentions here that I can find (1st of August, JDK17), while the prior build (18th of July) only has one signature per file.

Noodling over the PR now.

@steelhead31
Copy link
Contributor

steelhead31 commented Aug 14, 2024

@sxa
Copy link
Member Author

sxa commented Aug 14, 2024

Noting that the PR Adam mention is included int he branch I'm using so that should not be a difference that explains the failures I'm seeing.

@sxa sxa added the secure-dev Issues specific to SSDF/SLSA compliance work label Aug 22, 2024
@sxa
Copy link
Member Author

sxa commented Aug 22, 2024

Noting that I have had a jdk8u build go through without issues, so this may only be affecting some of them (jdk21u has failed on me twice, no successes with the verification subjob)

@sxa
Copy link
Member Author

sxa commented Aug 30, 2024

Fails for jdk21u and jdk17u - 11 currently under test at https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/windbld/247 (also failed) ...
jdk8u is ok

@sxa
Copy link
Member Author

sxa commented Sep 2, 2024

According to @andrew-m-leonard "the signing does not occur for “USE_ADOPT_SHELL_SCRIPTS”: false" (Ref this line). The conditional on this line line is the reason it doesn't fail for jdk8u
We should aim to update the wiki page on running from your own branch to make this clear. In the meantime I have adjusted my windbld job to be running explicitly from my branch so that the option can be set to true.

@sxa
Copy link
Member Author

sxa commented Sep 2, 2024

New pipelines (Note I'm only including the links to the ones that didn't fall foul of #3714):

@andrew-m-leonard
Copy link
Contributor

We should aim to update the wiki page on running from your own branch to make this clear.

Done

@sxa
Copy link
Member Author

sxa commented Sep 3, 2024

For some reason I'm getting failures at the end of the build when setting that value to true. It seems consistent and across all versions I've tried:


10:37:00  C:\workspace\openjdk-build>rm -rf C:/workspace/openjdk-build/workspace/build/straceOutput 
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
10:37:00  $ docker stop --time=1 4cc042d8ab698640fb727e7c3e3029453d094e7c2ac30c76aabfd29410053cbd
10:37:07  $ docker rm -f --volumes 4cc042d8ab698640fb727e7c3e3029453d094e7c2ac30c76aabfd29410053cbd
[Pipeline] // withDockerContainer
[Pipeline] }
[Pipeline] // ws
[Pipeline] }
[Pipeline] // node
[Pipeline] }
[Pipeline] // stage
[Pipeline] echo
10:37:07  Execution error: java.io.IOException: Cannot run program "docker" (in directory "/var/jenkins/workspace/build-scripts/jobs/jdk21u/windbld"): error=2, No such file or directory
[Pipeline] echo
10:37:07  java.io.IOException: Cannot run program "docker" (in directory "/var/jenkins/workspace/build-scripts/jobs/jdk21u/windbld"): error=2, No such file or directory

At this point it should be out of the section at https://github.com/adoptium/ci-jenkins-pipelines/blob/d33d46f39f7f3538abccf6520117c44530d85441/pipelines/build/common/openjdk_build_pipeline.groovy#L1623 (The rm -rf shown at the start of the log fragment above is after it comes out of that if block)

@sxa
Copy link
Member Author

sxa commented Sep 3, 2024

It seems likely that this is being caused when the pipelines switch to the Eclipse system at https://github.com/adoptium/ci-jenkins-pipelines/blob/d33d46f39f7f3538abccf6520117c44530d85441/pipelines/build/common/openjdk_build_pipeline.groovy#L1652

Theory: It may be expecting it to still be running under a docker context (build() calls buildScripts() under a docker context and that includes the signing section) and so it's baulking at being unable to find docker on the eclipse system.

@sxa
Copy link
Member Author

sxa commented Sep 4, 2024

Noting that the ENABLE_SIGN option only takes effect for the final archive, not the signing of the individual build components on mac/windows. In order to make that get skipped when ENABLE_SIGN is false this line:

https://github.com/adoptium/ci-jenkins-pipelines/blob/d33d46f39f7f3538abccf6520117c44530d85441/pipelines/build/common/openjdk_build_pipeline.groovy#L1627

should be mofified to have a && buildConfig.ENABLE_SIGNER == "true" clause

@sxa
Copy link
Member Author

sxa commented Sep 6, 2024

I had a call with @andrew-m-leonard today. We're going to look at a bit of a refactor of the build/buildScripts functions in openjdk_build_pipelines.groovy to split the 1400-line buildScripts function into three parts separately invoked from build:

  • The initial build phase using make-adopt-build-farm.sh (On windows/mac when UseAdoptShellScripts is true this will build with BUILD_ARGS=--make-exploded-image as currently).
  • The signing, which needs to be executed outside the docker context and on the seperate eclipse-codesign machine.
  • The 'exploded image assemble' (Another make-adopt-build.farm.sh invocation with BUILD_ARGS=--assemble-exploded-image which will be done on mac and Windows (Windows under docker if desired).

This will give us better control of how they are executed and allow us to resolve the problem of the entire build phase currently being run in a docker context in the build function and hopefully, as a bonus, make the logic in those easier to comprehend.

Noting that in terms of size, the current groovy file is about 2300 lines long. buildScripts is 400 lines of that. The USE_ADOPT_SHELL_SCRIPTS section - primarily for the windows/mac exploded build and signing - is about 160 lines, of which about 140 is the parts run in the codesign context.

An alternate solution was to fire off a separate job to handle the signing, but that would have required quite a bit of extra logic to be written (and wouldn't give us the simplicity benefits). The proposed solution doesn't not preclude doing this "internal signing" in a separate job in the future if desired.

@sxa
Copy link
Member Author

sxa commented Sep 11, 2024

I've started work in splitting up the buildScripts into three:

  • buildScriptsEclipseSigner
  • buildScriptsAssemble
  • buildScripts (The first, shared step)

I've had to replicate some things between the buildScripts and buildScriptsAssemble to ensure that the environment and other things are set up appropriately.
I'm currently hitting this:
15:09:26 make[3]: *** No rule to make target '/cygdrive/c/progra~2/micros~1/2022/buildt~1/vc/redist/msvc/1436~1.325//x64/microsoft.vc143.crt/vcruntime140.dll', needed by '/cygdrive/c/workspace/openjdk-build/workspace/build/src/build/windows-x86_64-server-release/support/modules_libs/java.base/vcruntime140.dll'. Stop.
which is failing because micros~1 is pointing at the Microsoft instead of Microsoft Visual Studio 2022. Unclear why this is only showing up at the assemble phase, but there is nothing in the Microsoft Visual Studio/2022 directory in the container, and there is also no Windows Kits directory under C:\Program Files (x86) so one of those will likely need to be resolved.

@sxa
Copy link
Member Author

sxa commented Sep 12, 2024

@andrew-m-leonard A thought - since these changes will involve running part of it on the build machine, then switching over outside the main build context, and then switching back (starting up a new container but with the same workspace) do you think there is a risk of another job coming onto the machine to run another build while the signing is in process, or will it be locked out (I haven't experimented yet, but your experience might let you know whether we'll have a potential issue here)

@andrew-m-leonard
Copy link
Contributor

@andrew-m-leonard A thought - since these changes will involve running part of it on the build machine, then switching over outside the main build context, and then switching back (starting up a new container but with the same workspace) do you think there is a risk of another job coming onto the machine to run another build while the signing is in process, or will it be locked out (I haven't experimented yet, but your experience might let you know whether we'll have a potential issue here)

@sxa The 1st build is running on a Windows node context, it then switches from there to eclipse-codesign, and then returns, I believe the 1st node context is still locked

@sxa
Copy link
Member Author

sxa commented Sep 13, 2024

@andrew-m-leonard A thought - since these changes will involve running part of it on the build machine, then switching over outside the main build context, and then switching back (starting up a new container but with the same workspace) do you think there is a risk of another job coming onto the machine to run another build while the signing is in process, or will it be locked out (I haven't experimented yet, but your experience might let you know whether we'll have a potential issue here)

@sxa The 1st build is running on a Windows node context, it then switches from there to eclipse-codesign, and then returns, I believe the 1st node context is still locked

Yeah that's definitely true in the non-docker case. Hopefully that can be retained somehow that in the new version where we're running separate contexts for the initial build and final step (Since the signing step can't be inside the docker context any more). Ideally I need to be watching it when it tries to do the signing to see if the machine is still locked.

@sxa
Copy link
Member Author

sxa commented Sep 16, 2024

I've started work in splitting up the buildScripts into three:
I'm currently hitting this: 15:09:26 make[3]: *** No rule to make target '/cygdrive/c/progra~2/micros~1/2022/buildt~1/vc/redist/msvc/1436~1.325//x64/microsoft.vc143.crt/vcruntime140.dll', needed by '/cygdrive/c/workspace/openjdk-build/workspace/build/src/build/windows-x86_64-server-release/support/modules_libs/java.base/vcruntime140.dll'. Stop.

Looking at this and a series of similar errors that we're getting it looks like the shortnames are not initial present on the docker container, but are showing up as soon as cygwin touches the directory e.g. a ls -l on the appropriate place will generate the shortnames.

I have therefore started playing "whack-a-mole" and currently have this just before the assemble phase which is doing a good job of catching most of the scenarios ... Maybe I should just do an 'ls -lR' to be safe though:

context.bat(script: 'ls -l /cygdrive/c "/cygdrive/c/Program Files (x86)" "/cygdrive/c/Program Files (x86)/Microsoft Visual Studio/2022" "/cygdrive/c/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Redist/MSVC" "/cygdrive/c/Program Files (x86)/Windows Kits/10/bin" "/cygdrive/c/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC"')

@sxa
Copy link
Member Author

sxa commented Sep 17, 2024

After adding the include and lib directories from Windows Kits to the ls list for generating shortnames I've managed to get the assemble phase to completion. It's now tripping up on the sign phase saying Copied 0 artifacts from "[build-scripts » jobs » jdk21u » windbld](https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/windbld/)" build number [645](https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/windbld/645/) but that is presumably just due to the archiving step not being in the right place, so they are not available for copying. so good progress and the first time we're looking like having a usable build from the new process :-)

@sxa
Copy link
Member Author

sxa commented Sep 25, 2024

First job with a successful sign_verification step: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/windbld/709
There's a lot debug, and probably too much duplication in the new assembleBuild function at the moment which will now need to be tidied up but this is a great step forward.

These builds were done from the code at https://github.com/sxa/ci-jenkins-pipelines/tree/89471642822aad4583b9a41353e1475ecb3e1767 and should be in a PR shortly (Noting that this was done using a cached tarball of the initial build phase to save time, but all of the signing and assemble steps were done "normally" during that build job.

@sxa
Copy link
Member Author

sxa commented Oct 8, 2024

PR is now working on Windows/docker systems. I've also tested on a newly created docker image and that works ok.
Currently verifying that we haven't got any breaks in the non-docker case (and non-Windows)

Stuff to consider:

  • I'm having to run a chmod -R u+w on the build output directory prior to unstash otherwise I'm getting a permissions error (public_suffix_list.dat and others)
    • Further analysis (adding in debug permissions checks before and after the unstash from the signing stage) showed that the following files do not have write access, even in the "normal" non-docker Windows builds, which /may/ indicate that they are not included in the signing step:
      • build/src/build/windows-x86_64-server-release/jdk/lib/security/public_suffix_list.dat (Note that another version of this file in the workspace is world writable, and the same one accessed as C:\ instead of /cygdrive/c as the prefix also appears to be writable 🤔
      • build/src/build/windows-x86_64-server-release/configure-support/classes.jsa (This one isn't causing a problem though)

@sxa
Copy link
Member Author

sxa commented Oct 9, 2024

That's interesting - the ls -l output on files is different depending on whether you reference it as /cygdrive/c/ or c:/.
This is from some debug added before the unstash in the assemble phase. The first of these using /cygdrive/c shows the problem as it's not writable (Note that this is being done deliberately for this ffile - ref https://github.com/adoptium/jdk21u/blob/eced83e13090748218ab3dac78f6ff1bddf2b158/make/modules/java.base/gendata/GendataPublicSuffixList.gmk#L36)

12:45:14  C:\workspace\openjdk-build>c:\cygwin64\bin\find /cygdrive/c/workspace -name public_suffix_list.dat -ls 
12:45:23  19421773393677273    228 -rwxr--r--   1 ContainerUser ContainerUser   230480 Oct  9 10:32 /cygdrive/c/workspace/openjdk-build/workspace/build/src/build/windows-x86_64-server-release/jdk/lib/security/public_suffix_list.dat
12:45:23   2251799814254512    228 -r-xr--r--   1 ContainerUser ContainerUser   230480 Oct  9 10:32 /cygdrive/c/workspace/openjdk-build/workspace/build/src/build/windows-x86_64-server-release/support/modules_libs/java.base/security/public_suffix_list.dat
12:45:23   5910974511690577    300 -rwxrw-r--   1 ContainerUser ContainerUser   305218 Oct  7 23:26 /cygdrive/c/workspace/openjdk-build/workspace/build/src/src/java.base/share/data/publicsuffixlist/public_suffix_list.dat
[Pipeline] bat
12:45:24  
12:45:24  C:\workspace\openjdk-build>c:\cygwin64\bin\find C:/workspace -name public_suffix_list.dat -ls 
12:45:31  19421773393677273    228 -rw-r--r--   1 ContainerUser ContainerUser   230480 Oct  9 10:32 C:/workspace/openjdk-build/workspace/build/src/build/windows-x86_64-server-release/jdk/lib/security/public_suffix_list.dat
12:45:31   2251799814254512    228 -rw-r--r--   1 ContainerUser ContainerUser   230480 Oct  9 10:32 C:/workspace/openjdk-build/workspace/build/src/build/windows-x86_64-server-release/support/modules_libs/java.base/security/public_suffix_list.dat
12:45:31   5910974511690577    300 -rw-r--r--   1 ContainerUser ContainerUser   305218 Oct  7 23:26 C:/workspace/openjdk-build/workspace/build/src/src/java.base/share/data/publicsuffixlist/public_suffix_list.dat

@sxa
Copy link
Member Author

sxa commented Oct 10, 2024

Note that using a chmod -R u+rwXinstead of a+rwX also doesn't work, as we get issues with another file. Compare windbld#927 with u vs windbld#928 with a - the former failed with:
14:07:09 Caused by: java.nio.file.AccessDeniedException: C:\workspace\openjdk-build\workspace\build\src\build\windows-x86_64-server-release\support\modules_libs\java.base\api-ms-win-core-console-l1-1-0.dll

@sxa
Copy link
Member Author

sxa commented Oct 10, 2024

Note to self, build 932 which was on build-azure-3 completed the initial build in 1h10m.
build 933, by contrast, was on build-azure-2 and completed the same phase in 0h39m. The latter included:

    "CLEAN_WORKSPACE": true,
    "CLEAN_WORKSPACE_AFTER": true,

Edit: Seems likely that build-azure-2 has a better type of disk:
932 on build-azure-3

19:00:20  build.sh : 18:00:17 : Build complete ...
19:00:20  build.sh : 18:00:17 : All done!
19:18:01  Total disk space in Kb consumed by build process: 4513342	C:/workspace/openjdk-build

933 on build-azure-2

18:46:25  build.sh : 17:46:25 : Build complete ...
18:46:25  build.sh : 17:46:25 : All done!
18:46:38  Total disk space in Kb consumed by build process: 4507702	C:/workspace/openjdk-build

@sxa
Copy link
Member Author

sxa commented Nov 1, 2024

Will be Fixed-by: adoptium/ci-jenkins-pipelines#1117

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docker os:windows secure-dev Issues specific to SSDF/SLSA compliance work
Projects
Status: Done
4 participants