Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: platforms, architectures and dpkg_status after #115 #120

Merged
merged 7 commits into from
Dec 11, 2024

Conversation

jjmaestro
Copy link
Contributor

@jjmaestro jjmaestro commented Dec 2, 2024

After landing #115 we need platforms to be non-dev. This would have been caught in the e2e test but since #115 didn't make changes to the e2e test, it went unnoticed.

To prevent this, I've also copied the examples/debian_snapshot to the e2e test. IMHO this helps catch bugs that affect third-party repos using rules_distroless as a dependency, such as the platforms dependency.

I've also verified that we do need the select in the architecture, as mentioned in my code review. Otherwise, the Config in the manifest is always set to arm64 regardless of the architecture that we are building.

I've also fixed the unmapped Debian architectures (e.g. "all"). Basically, _ARCHITECTURE_MAP[arch] or arch fails when a Debian arch is not in the map. It should be _ARCHITECTURE_MAP.get(arch, arch) to default to arch when it's not in the map. To exercise this and ensure it always works I've:

  • removed "arm64" since it's a valid CPU platform already (and without this fix it would have already failed in all the examples and the e2e test).
  • added armhf mapping to armv7e-mf
  • changed Debian's ppc64el mapping to the exact matching platforms CPU (ppc64le)
  • added a test to exercise arch all (which is both an arch that's not in the map while being "a special Debian arch" and also a valid platforms CPU constraint).
  • Moved architecture doc links next to _ARCHITECTURE_MAP and added a bunch more links to the Debian wikis.

Finally, I've fixed _PACKAGES, :packages and :dpkg_status to be different per-architecture, because there are cases where the packages differ between architectures. E.g. currently in the examples/ubuntu_snapshot, coreutils depends on libssl3 on amd64 but there's no such dependency on arm64. Thus, the list of packages and dpkg_status has to be different.

@thesayyn
Copy link
Collaborator

thesayyn commented Dec 2, 2024

I am little at unease about the symlinks, this will most likely break BCR, because of

examples export-ignore
and how BCR uses release archives (created via git archive with examples folder excluded) to run presubmit.

Can we just improve e2e so it catches this case?

@jjmaestro
Copy link
Contributor Author

Ah, I didn't know about BCR! Sure, I'll copy over the test and will make a note in it to keep it in sync.

@jjmaestro jjmaestro force-pushed the fix-pr-115 branch 2 times, most recently from ccbc65d to e867017 Compare December 2, 2024 23:40
@jjmaestro
Copy link
Contributor Author

@thesayyn Added another fix I caught while adding one of the tests that I wrote for #100 :) Let me know if you like the PR and let the workflows run when you can (I've manually tested the PR so it should hopefully run green). Thanks!

@jjmaestro
Copy link
Contributor Author

@thesayyn Added yet-another fix, the "new" dpk_status from #115 would fail if there were different packages installed per-architecture. This is actually the case in at least one of the examples in the repo. See b0cc2da for more details.

@jjmaestro jjmaestro changed the title fix: platforms and architecture after #115 fix: platforms, architectures and dpkg_status after #115 Dec 5, 2024
@thesayyn thesayyn self-requested a review December 10, 2024 06:21
MODULE.bazel Outdated Show resolved Hide resolved
Copy link
Collaborator

@thesayyn thesayyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thanks for the fixes!

There are small changes to be made but overall good. (sorry for the conflicts.)

@thesayyn
Copy link
Collaborator

would fail if there were different packages installed per-architecture

Yup, this is a great catch and paves the way for per architecture debian repositories and packages #56

@jjmaestro
Copy link
Contributor Author

Oh wow, I just fixed the PR and I was checking "Compare" and the rebase still shows!!? even when it's rebased over main! 😭 I was expecting that GH would separate rebasing over main VS e.g. other branches and would NOT include those as "changes"... at least provide a way to see "all changes" and "changes VS last state of the PR".

@thesayyn this, again, will make it difficult to review, sorry 😞 It'll be easier if you check the final state of the commits, but basically, I've done as you requested in your comments.

@jjmaestro
Copy link
Contributor Author

jjmaestro commented Dec 10, 2024

OK, I've produced such a diff myself with:

$ git co -b fix-pr-115-OLD-rebased b0cc2da
$ git rebase main

$ git rev-parse --short fix-pr-115-OLD-rebased
0f4a221

$ git rev-parse --short fix-pr-115            
7711ba0

git diff fix-pr-115-OLD-rebased fix-pr-115 -- . ':(exclude)*.lock.json'

But GH doesn't let me attach it as a file to the comment!?? srsly... it seems to only allow PDF, images, movies, etc.

Anyway, here you go, in case it makes the review easier:

EDIT: I've pushed the old rebased branch so that it's easier to compare via GH with a manual compare link: Compare 0f4a221...7711ba0

@thesayyn
Copy link
Collaborator

You have some buildifier errors.

@thesayyn
Copy link
Collaborator

https://github.com/GoogleContainerTools/rules_distroless/actions/runs/12284130637/job/34279199329?pr=120#step:3:40

@jjmaestro
Copy link
Contributor Author

@thesayyn I've created #135 from main that fixes some buildifier stuff that got through the cracks (somehow didn't run pre-commit run I think?). I'll rebase on that one and see if there's anything left here as well.

There are cases where the packages differ between architectures. E.g.
currently in the `examples/ubuntu_snapshot`, `coreutils` depends on
`libssl3` on `amd64` but there's no such dependency on `arm64`. Thus,
the list of packages and `dpkg_status` has to be different.

This failure wasn't caught because this corner case was only happening
in the Ubuntu example which was still using "the old `dpkg_status`" that
as done by hand, just passing a shorter list of packages and not the
full list of installed packages (as implemented in GoogleContainerTools#115).

If the test is migrated, when running without the fix it fails with:

ERROR: no such package '@@_main~apt~noble//libssl3/arm64': BUILD file not found in directory 'libssl3/arm64' of external repository @@_main~apt~noble. Add a BUILD file to a directory to mark it as a package.
ERROR: /home/nonroot/.cache/bazel/_bazel_nonroot/a08c2e4811c846650b733c6fc815a920/external/_main~apt~noble/libssl3/BUILD.bazel:1:6: no such package '@@_main~apt~noble//libssl3/arm64': BUILD file not found in directory 'libssl3/arm64' of external repository @@_main~apt~noble. Add a BUILD file to a directory to mark it as a package. and referenced by '@@_main~apt~noble//libssl3:libssl3'
ERROR: Analysis of target '//examples/ubuntu_snapshot:_noble_index_json' failed; build aborted: Analysis failed

I've now moved `examples/ubuntu_snapshot` to use the "new"
`dpkg_status`.

I've also added an explicit test to check that libssl3 is installed
in amd64 and another test to check that it's NOT installed in arm64.
These two have been the same but there's been changes to both that
weren't replicated / synced into the other.

I've now synced and made a comment to remark that it's important to keep
these in-sync because testing in e2e with at least the same base test
helps in catching bugs like the previous commit, where platforms repo
should have been marked as non-dev.

Finally, I've moved both tests to Cloudflare's Debian snapshot. Looks
like the change happened in f994712 without any explanation and, as
usual, I keep finding Debian snapshot extremely flaky and unreliable.
Cloudflare has had some issues in the past (e.g. lagging behind
replication for a long time) but these are quite rare and resolve much
quicker than the Debian snapshot.
Regardless of the `platform_transition_filegroup`, the architecture
needs a `select` to properly set the architecture in the manifest.

I verified this by running the test and inspecting the generated config
JSON. Ideally, this should be encoded in a test but I'm not sure how to
go about it.
`_ARCHITECTURE_MAP[arch] or arch` fails when a Debian arch is not in the
map. Thus, add all of the Debian architectures that map to platforms CPU
architectures.

Add "all" arch as well (plus an additional resolution test).

Also:
  * change Debian's ppc64el mapping to the exact matching platform CPU
    (ppc64le)

  * move architecture doc links next to _ARCHITECTURE_MAP and add a
    bunch more links to the Debian wikis.
The list inside the dict should be formatted and indented.
@jjmaestro
Copy link
Contributor Author

OK, this is with the rebase, now fixing the broken buildifiers left here...

@thesayyn
Copy link
Collaborator

@thesayyn I've created #135 from main that fixes some buildifier stuff that got through the cracks (somehow didn't run pre-commit run I think?). I'll rebase on that one and see if there's anything left here as well.

It's just Readme files over there, which is not related to your changes, you have some unused loads.

@jjmaestro
Copy link
Contributor Author

OK, it should be green now! If you also push #135 for main, then everything should be green 👌 I'm still a bit baffled because I thought I had pre-commit installed and running... maybe I deactivated it at some point and didn't realize... anyway, it's now set so it should all be fine from here!

@thesayyn thesayyn merged commit 936081d into GoogleContainerTools:main Dec 11, 2024
10 checks passed
@thesayyn
Copy link
Collaborator

This made macos unhappy.

@thesayyn
Copy link
Collaborator

https://github.com/GoogleContainerTools/rules_distroless/actions/runs/12284400987/job/34281214039

jjmaestro added a commit to jjmaestro/rules_distroless that referenced this pull request Dec 11, 2024
The pre-commit run --all-files that I did and rebased into GoogleContainerTools#120 broke
the README :(

This fixes it, undoing some of the changes and manually fixing others so
that prettier now runs without breaking this Markdown.
@jjmaestro
Copy link
Contributor Author

@thesayyn :S will look into it now!

@jjmaestro
Copy link
Contributor Author

@thesayyn OK, so it's a bit weird, at least to me. I've managed to run all tests in //examples manually, e..g going one by one through bazel test //examples/flatten/... and so on, except //examples/debian_snapshot:test and //examples/ubuntu_snapshot:test because they are SKIPPED because they are guarded by target_compatible_with.

Even so, if I try to run them, I get, as expected

% bazel test //examples/debian_snapshot:test --cache_test_results=no
WARNING: Ignoring JAVA_HOME, because it must point to a JDK, not a JRE.
Target //examples/debian_snapshot:test failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: Analysis of target '//examples/debian_snapshot:test' failed; build aborted: Target //examples/debian_snapshot:test is incompatible and cannot be built, but was explicitly requested.
Dependency chain:
    //examples/debian_snapshot:test (efbef2)   <-- target platform (@@platforms//host:host) didn't satisfy constraint @@platforms//os:linux
INFO: Elapsed time: 0.119s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
ERROR: Build did NOT complete successfully
ERROR: No test targets were found, yet testing was requested

However! If I run bazel test //examples/... then it breaks!!

% bazel test //examples/... --cache_test_results=no
WARNING: Ignoring JAVA_HOME, because it must point to a JDK, not a JRE.
ERROR: /private/var/tmp/_bazel_jjmaestro/be9c074a2171a3cb6c673d10113bd570/external/_main~apt~noble/BUILD.bazel:196:12: configurable attribute "controls" in @@_main~apt~noble//:dpkg_status doesn't match this configuration. Would a default condition help?

Conditions checked:
 @@_main~apt~noble//:linux_amd64
 @@_main~apt~noble//:linux_arm64

To see a condition's definition, run: bazel query --output=build <condition label>.

This instance of @@_main~apt~noble//:dpkg_status has configuration identifier 69bb7f0. To inspect its configuration, run: bazel config 69bb7f0.

For more help, see https://bazel.build/docs/configurable-attributes#faq-select-choose-condition.

Use --verbose_failures to see the command lines of failed build steps.
ERROR: Analysis of target '//examples/ubuntu_snapshot:_noble_index_json' failed; build aborted: Analysis failed
INFO: Elapsed time: 0.178s, Critical Path: 0.03s
INFO: 11 processes: 11 internal.
ERROR: Build did NOT complete successfully
//examples/cacerts:test_cacerts                                       NO STATUS
//examples/debian_snapshot:test                                         SKIPPED
//examples/flatten:test_flatten                                       NO STATUS
//examples/flatten:test_flatten_dedup_listing                         NO STATUS
//examples/flatten:test_flatten_dedup_mtree                           NO STATUS
//examples/group:test_group                                           NO STATUS
//examples/group:test_group_content                                   NO STATUS
//examples/home:test_home                                             NO STATUS
//examples/locale:test_bookworm                                       NO STATUS
//examples/locale:test_bullseye                                       NO STATUS
//examples/os_release:test_os_release                                 NO STATUS
//examples/os_release:test_os_release_alternative_path                NO STATUS
//examples/os_release:test_os_release_content                         NO STATUS
//examples/passwd:test_passwd                                         NO STATUS
//examples/passwd:test_passwd_content                                 NO STATUS
//examples/statusd:test_statusd                                       NO STATUS

Executed 0 out of 16 tests: 16 were skipped.

So, somehow, something is running on a previous phase / stage? :-?

Anyway, I'll add a default step to the selects but not sure if that will be enough to fix it.

jjmaestro added a commit to jjmaestro/rules_distroless that referenced this pull request Dec 11, 2024
There are some tests that also run in macos and Bazel was throwing an
error because the `select`s in the package repos needed a default
condition.
@jjmaestro
Copy link
Contributor Author

OK, got it fixed in #136!

jjmaestro added a commit to jjmaestro/rules_distroless that referenced this pull request Dec 11, 2024
Bazel was throwing an error when running the tests in MacOS even when
the Linux tests were guarded with a target_compatible_with that was
restricted to Linux.

As far as I could tell, this was because running the bazel test with
//... was recursively evaluating other targets like
//examples/ubuntu_snapshot:_noble_index_json from oci_image and
oci_load. When I guard those rules with a target_compatible_with
everything works.

See GoogleContainerTools#136
for more context and information.

Also, seeing that e5f7dc0 also manually commented a test to fix CI
for macos so I guess GoogleContainerTools#120 didn't break this, it was probably happening
since GoogleContainerTools#115 added the new convenience targets.

Finally, maybe this is something that should be fixed in rules_oci, the
"inner targets" that it creates should probably be restricted to only
run in the `os` and `architecture` given for the image :-?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants