-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for SOURCE_DATE_EPOCH in sdist. #2133
Comments
This pulls just enough of distutils' and modify the make_tarball function in order to respect SOURCE_DATE_EPOCH; this will ensure that _when set_ no timestamp in the final archive is greater than timestamp. This allows (but is not always sufficient), to make bytes for bytes reproducible build for example: - This does not work with `gztar`, and zip does embed a timestamp in the header which currently is `time.time()` in the standard library. - if some fields passed to setup.py have on determinstic ordering (for example using sets for dependencies). Partial work toward pypa#2133, with this I was able to make two bytes-identical sdist of IPython.
This pulls just enough of distutils' and modify the make_tarball function in order to respect SOURCE_DATE_EPOCH; this will ensure that _when set_ no timestamp in the final archive is greater than timestamp. This allows (but is not always sufficient), to make bytes for bytes reproducible build for example: - This does not work with `gztar`, and zip does embed a timestamp in the header which currently is `time.time()` in the standard library. - if some fields passed to setup.py have on determinstic ordering (for example using sets for dependencies). Partial work toward pypa#2133, with this I was able to make two bytes-identical sdist of IPython. You will see three types of modifications: - Referring explicitly to some of distutils namespace in a couple of places, to avoid duplicating more code. Note that despite some names _not_ changing as the name resolution is with respect to current module, unchanged functions will now use our modified version. - overwrite `make_archive` in sdist to use our patched version of the functions in archive_utils. - update make_tarball to look for SOURCE_DATE_EPOCH in environment and setup a filter to modify mtime while taring.
There's some excellent work towards this started in #2136, thanks @Carreau! Are you planning to pick this up? If not, perhaps I could help finish up this work? We would like to be able to produce reproducible sdists for python-tuf. (Curious readers can see: theupdateframework/python-tuf#1269) |
At some point; but I don't have much time these days; feel free to take over. |
I'm interested in reproducible sdists, too. Reproducible artifacts make it much easier to verify the provenance of code.
|
Just in case this is useful to others, I paste below a self-contained hunk of monkeypatching that allowed me to get reproducible (same sha256 hash) sdist tarballs. This hunk of code can be dumped in setup.py, for example. # Support for Reproducible Builds
# https://reproducible-builds.org/docs/source-date-epoch/
timestamp = os.environ.get('SOURCE_DATE_EPOCH')
if timestamp is not None:
import distutils.archive_util as archive_util
import stat
import tarfile
import time
timestamp = float(max(int(timestamp), 0))
class Time:
@staticmethod
def time():
return timestamp
@staticmethod
def localtime(_=None):
return time.localtime(timestamp)
class TarInfoMode:
def __get__(self, obj, objtype=None):
return obj._mode
def __set__(self, obj, stmd):
ifmt = stat.S_IFMT(stmd)
mode = stat.S_IMODE(stmd) & 0o7755
obj._mode = ifmt | mode
class TarInfoAttr:
def __init__(self, value):
self.value = value
def __get__(self, obj, objtype=None):
return self.value
def __set__(self, obj, value):
pass
class TarInfo(tarfile.TarInfo):
mode = TarInfoMode()
mtime = TarInfoAttr(timestamp)
uid = TarInfoAttr(0)
gid = TarInfoAttr(0)
uname = TarInfoAttr('')
gname = TarInfoAttr('')
def make_tarball(*args, **kwargs):
tarinfo_orig = tarfile.TarFile.tarinfo
try:
tarfile.time = Time()
tarfile.TarFile.tarinfo = TarInfo
return archive_util.make_tarball(*args, **kwargs)
finally:
tarfile.time = time
tarfile.TarFile.tarinfo = tarinfo_orig
archive_util.ARCHIVE_FORMATS['gztar'] = (
make_tarball, *archive_util.ARCHIVE_FORMATS['gztar'][1:],
) A few explanations follow:
PS: Maybe this approach is simple enough to incorporate into setuptools? |
@dalcinl thanks that is great ! |
Better to use uid = gid = 0, and set uname/gname to empty string. Otherwise you're in for fun surprises when extracting the tarball as root on systems that have uid/gid 1000. In particular files that are executable only by the user would now be executable by this 1000 user, that can be a security issue. Python itself notably tries to change ownership in tarfile: https://github.com/python/cpython/blob/2bbbab212fb10b3aeaded188fb5d6c001fb4bf74/Lib/tarfile.py#L2530 |
I've updated the code snippet as per your recommendation. Thanks. |
The snippet from @dalcinl is very helpful, thanks! If you don't have a
Some notes:
It's tested on {macOs, Linux, Windows} x Py-3.{8,9,10,11,12}. It does not work in Python-3.7 (and I did not bother to investigate why, since 3.7 is EOL now). |
SOURCE_DATE_EPOCH is useful for reproducible build, when set, no timestamp should be greater than this value.
It seem that setuptools sdist does not support SOURCE_DATE_EPOCH, I've traced it to the following:
sdit inherit from Commands, which leads to these successives calls.
Make tarball seem to be the right place to monkeypatch to look for SOURCE_DATE_EPOCH as it itself can pass a filter to
tarfile.add()
, which will ensure the mtime is bounded (it already pass a filter to set uid/gid).With this most sdist (except tgz) are reproducibles. TGZ has this last problem that
GzipFile
addstime.time()
in the header and that's a bit harder to patch.The text was updated successfully, but these errors were encountered: