-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate that builds are reproducible #507
Comments
As @ForestEckhardt points out on his draft PR, there are still aspects of python builds that are NOT reproducible (now that SBOM reproducibility has been resolved). Some of these may be unavoidable parts of python build processes. I'll investigate this on the issue. |
For the simple |
For the The cpython layer seems to differ because it contains a Likewise, the I notice that this PR sets Exploration reveals that setting the environment variable to a temp location at build time results in reproducible builds of the pack build pip-sample-one --buildpack paketo-buildpacks/python:2.2.0 --builder paketobuildpacks/builder-jammy-buildpackless-base:0.0.19 --env PYTHONPYCACHEPREFIX="/tmp" --clear-cache
...
pack build pip-sample-two --buildpack paketo-buildpacks/python:2.2.0 --builder paketobuildpacks/builder-jammy-buildpackless-base:0.0.19 --env PYTHONPYCACHEPREFIX="/tmp" --clear-cache
...
docker image ls | grep pip-sample
pip-sample-one latest 3303ad6715c7 42 years ago 321MB
pip-sample-two latest 3303ad6715c7 42 years ago 321MB |
For the Edit: |
For the This ongoing discussion on the poetry repository upstream suggests that installing poetry itself isn't currently reproducible, without going to some extreme lengths that diverge from the canonical ways that most users install poetry. Unless there is appetite for making the buildpack install poetry in a reproducible way, reproducible poetry-based builds are a non-starter. This is unfortunate because, theoretically, poetry enables reproducible installation of the packages it manages by introducing a lock file concept. |
For
In summary, setting Example conda-meta/history
|
Just a quick note, in addition to the option of From the docs:
If it works as advertised, it could be a more elegant option than writing files to a discarded directory (e.g. |
Following on from my previous comment, we decided to use |
If it's of any interest, another approach (that I'm using for the in-progress Heroku Python CNB) is to switch the pyc invalidation mode from its default of "timestamp" to one of the hash based modes ("checked-hash" or "unchecked-hash"). These modes are discussed in PEP-552, which is about deterministic pycs: The advantage this has over not writing the pycs at all (which is the case when using To switch to hash based pycs, you would need to:
For (1), see here for prior art: For (2), there are two options:
|
@edmorley very interesting, thank you for sharing! We were looking for something that would control the reproducibility of the pyc files, and we saw PEP 0552, but I think we missed this key paragraph (emphasis mine):
That would be really interesting to explore. The main issue that I see with using Taking a step back, we opted for Finally, I'd love to learn more about your plans for the Python buildpack on Heroku, specifically if there are any blockers to using this buildpack rather than writing your own? I understand if that's something you're not willing or able to share, but I would love to learn more about any blockers or issues you have with this buildpack in its current form. |
In my local WIP implementation, I set
My testing both locally and against non-CNB Heroku apps showed that pycs do actually make quite a difference. Is it possible your testing was when using timestamp mode, which will always get invalidated at runtime when using CNBs, due to Here's a comparison on a Heroku 2 vCPU instance, for a non-CNB Hello World Django app running Python 3.11.0 (source):
Also locally it makes a massive difference when running under QEMU with a non-ARM64 image (eg on an M1 Macbook) - given that (a) the upstream CNB project doesn't support multi-arch images very well yet, (b) even when they do there will still delay before stack image/buildpack support is sufficient to run the native arch, (c) even then, people may want to run the same image locally as they will run on their production AMD64 servers. For example, running
So I'm very exited that we now have a shared standard in the form of Cloud Native Buildpacks, and I'm sure there will be buildpacks that are commonly used/shared across platforms. However, given how central the core language buildpacks are to both the user experience and reliability of builds, I don't think it would ever be viable for us to use anything but our own implementations of them, for reasons like stack compatibility, needing to be in control of uptime/security of binary hosting, needing to be in control of design/UX/new features/feature sunset/documentation links etc. For example, if a customer opens a support ticket about builds failing or needing a new feature, or a new Python security release not being available yet, or their app being broken by a buildpack change - our answer cannot be "sorry we don't own the component in question, there's nothing we can do". I don't think this is a bad thing however - all of the CNB implementations can learn from each other (hence why I watch this repo and have commented above) - and end users will have more choices than they did with classic buildpacks :-) |
Thank you for the detailed explanation! There's definitely a lot in there for us to wrap our heads around. Hash-based pycs via
|
Describe the Enhancement
Builds with this buildpack should be reproducible, meaning given identical inputs, the SHAs of resulting buildpack-built images are the same. This means, for a given app, if I run:
and then run
with the same source code and configurations, the resulting image SHAs should be the same.
Currently, builds are not reproducible because of SBOMs included in the final app image. See paketo-buildpacks/packit#367 and paketo-buildpacks/packit#368. But once those issues are resolved and a new version of packit has been released, we should expect that the buildpack builds are reproducible.
Possible Solution
Add assertions to integration tests that show that two builds with the same inputs produce identical outputs.
Motivation
Build reproducibility is a selling point of CNBs that we want to provide to Paketo buildpack users. We want to know if future implementation decisions compromise build reproducibility.
The text was updated successfully, but these errors were encountered: