-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow pip install to install to a date: install only package versions that were available on or before that day. #12275
Comments
FYI, I'm pretty sure this package offers the functionality you want: https://pypi.org/project/pypi-timemachine/ I've used it exactly for debugging resolvablity issues that occured in the past but no longer occur. I would also love to have this built into Pip (or another package installer) also btw, but if you want something now that works you can use that package. |
From a quick look it seems like pypi-timemachine relies on the non-standard JSON API. For pip to implment this there needs to be a standard for exposing release dates on the index server. |
Closing since this cannot be acted on before we have a specification. Further discussion should happen on https://discuss.python.org, and we can open a new issue to track pip’s implementation once that’s settled. |
I'm happy to post there, but if you don't mind, can you explain in a bit more detail so I'll know what I'm talking about? I'm not familiar enough with the pypi/pip(/conda) ecosystem(s) to know what proposing 'a standard for exposing release dates on the index server' would mean. |
And FYI there was already some discussion on that already, I specifically asked for the upload timestamp to be included in the JSON API when it was first proposed: https://discuss.python.org/t/pep-691-json-based-simple-api-for-python-package-indexes/15553/3 There was significant push back, but I think most of that can be characterized as not wanting to add features to something that was proposed to mirror an existing API. I'm sure some addition could be further proposed. Edit: Somethow I missed PEP 700, as @uranusjr points out actually this does now exist: #12275 (comment)
Pip follows a standard "Simple Index API" (PEP 503 and PEP 691) that anyone can implement their own repository and point Pip to, so a lot of people use Pip in far more contexts that just PyPI. The library I previously linked to use a PyPI specific API which is non-standard and therefore Pip can't use it because there's no standard for other repositories to follow. Therefore to add something to the Simple Index API would need to be agreed by a winder community and https://discuss.python.org is where those discussions taked place. |
pip uses the Simple Reponsitory API to fetch a list of versions (of a package) to install. This is a Python standard, but the API does not contain information when a specific artifact was published. The above utility instead uses PyPI’s non-standard JSON endpoint for this information, but the endpoint is not stable, considered deprecated, and not necessarily (unlikely?) implemented by alternative indexes (Python have many). But oh wait… I missed there’s actually already a field defined for this: https://peps.python.org/pep-0700/
The latter field should contain the useful information, I think. And it seems like PyPI already implemented it: $ curl -sH 'Accept: application/vnd.pypi.simple.v1+json' https://pypi.org/simple/pip/ | jq '.["files"][0]'
{
"core-metadata": false,
"data-dist-info-metadata": false,
"filename": "pip-0.2.tar.gz",
"hashes": {
"sha256": "88bb8d029e1bf4acd0e04d300104b7440086f94cc1ce1c5c3c31e3293aee1f81"
},
"requires-python": null,
"size": 38734,
"upload-time": "2008-10-28T17:22:10Z",
"url": "https://files.pythonhosted.org/packages/3d/9d/1e313763bdfb6a48977b65829c6ce2a43eaae29ea2f907c8bbef024a7219/pip-0.2.tar.gz",
"yanked": false
} so pip can probably go on from here if we can figure out the user interface for this. Edit: Also need to figure out what to do if a server does not implement this, or only has partial information. |
Hmm, user interface. I think in order to make this as simple as possible but still accomplish the goal of 'get this old code to work again', I would add a new flag to 'install' that would add a new constraint to any 'touched' package in that install. So, as an example, if numpy is installed, pip leaves it as-is until asked to install package 'foo' from January 2023 that depends on numpy. At that point, when checking what numpy version is needed, an additional constraint that the max version of numpy allowed is whatever the latest version was in January of 2023. Not sure what to call the flag; longer names are kind of obnoxious, but at least are clear. Brainstorming some options: pip install foo --earlier-than 01-01-2023 For what to do when the server doesn't implement the date, I could see using 'strict mode' and 'non-strict mode', with non-strict being the default: I think you would want to allow 'strict mode' to assume that anything already installed was OK: pip install foo --as-of 01-01-2023 --strict pip install bar --version=2.2.1 pip install foo --as-of 01-01-2023 --strict None of these are hills I would die on; just ideas. |
(Also, thanks to @uranusjr for tracking down PEP 0700!) |
Mentioning for completeness (though you may be aware and there may be a reason why this isn't feasible for the use-cases you had in mind): Ensuring reproducible installs is pretty much the whole point of lockfiles - something that is supported very well by tools like Poetry already. Currently Pip doesn't try to be a project environment manager, but a package installer - and until that changes, I think it's best to stick to other tools that wrap Pip for project dependency management use-cases. |
I agree with this, but Poetry only supports maintaining a project. I think the much more likely user story for Pip is:
|
But it sounds like these are projects? The issue here is that the current design of Pip/the Python package ecosystem encourages people to iteratively One of the advantage of tools like Poetry (and similarly the default package manager in many other languages, like Rust's Cargo), is that it's much harder for the environment and the list of dependencies stored in the lockfile to get out of sync during normal usage, thanks to there being a single command (eg I realise that non-Pip Python project management tools are perhaps less well known to users, and so there's an education issue - but teaching them to use such tools if they need reproducible environments might still be simpler than trying to teach them how to use pip freeze, or simpler than trying to work around pip not being a project/environment manager by using a flag to only install packages available up to a certain date. |
My user story example was a very common case where a data scientist is working on their own user environment, they are not building wheels, do not have a pyproject.toml, and are working on multiple different things as part of rapid exploration. This user story, in my experience, does not fit well with Poetry.
I think there is also a complexity and maturity issue, while I look forward to the day that I would be happy to suggest a package/environment manager to a non-programmer using Python I do not currently feel comfortable suggesting anything right now as the barrier to learning these tools or the concern with them being one of the first users to encouter issues is too high for me. But to a certain extent I think this is kind of beside the point, I think the question is are there valid use cases of Pip where this feature would be useful. If a user installing packages one at a time by themselves is a valid use case of Pip, then I beleive there are situations like the one I described where this feature would be helpful. If that is not a valid use case of Pip, and users are expected to build tooling on top of Pip so they can always create reproducible environments (which is something I do at work but not at home), then I think it should be documented that general users should not be using Pip and instead look to use higher level tools, similar to the way |
It is at least my own experience that while python/pip usage is nigh-ubiquitous amongst my colleagues, poetry and pip freeze are virtually unknown. It would certainly be lovely if community education was in place to encourage that more, and that's probably a good idea, but I personally could imagine that slowly infiltrating the general python 'non-programmer' usage only after several years. Another use case is inherited projects. A grad student writes something in python to analyze lab data, then moves on, and someone else finds their work and wants to expand on it (this is essentially the use-case of the paper I referenced in the initial suggestion). Being able to install 'the packages at the date stamp of the file' is what the new user is going to have to do anyway, if pip can't do it directly, they'll have to do it by hand. Since looking up dates is something a computer is uniquely well-suited to do, adding that feature could save a lot of people a lot of time. |
While I agree with this point, I don't think it necessarily follows that this is something pip should do. First of all, there's the point already made that not all indexes might expose the "upload date" information. Pip isn't exclusively tied to PyPI, so we have to consider features in terms of how they interact with other indexes. Also, while reproducibility is important, that's only true for certain situations/use cases. Yes, we've all had the experience of picking something up that worked when we last used it, but doesn't now. But "it would be useful" isn't the only criterion when deciding what features should be added to pip. Also, there's the point that this can be done with a 3rd party tool ( So I think Footnotes
|
I largely agree with your points here. I just want to add a couple of things:
|
FYI here is a real world example where being able to set the date on the index is required to resolve the requirements: #12305 (comment) In this case it's also possible there could be optimizations for resolvelib/Pip to not hit ResolutionTooDeep, but currently they don't exit |
I'm increasingly +1 for this idea being built into Python package installers (including ones that provides lock files which only mitigate the issue if you already have a lock file and don't want to change it).
If no one else picks this up I might look to make a PR after I've finished working on the current resolution issues and optimizations I have open PRs for right now, and see how complex it looks. |
I'm -1 on this as a pip feature because:
A PR to what project? You say yourself that you want this built into "Python package installers". Will you create PRs for all such installers? Or were you expecting to just provide a PR for pip and hope that other installers would implement the functionality themselves? If running a proxy to support time machine style functionality is too messy or complex, I'd support adding features to pip that made it easier to run such "index middleware" - but as a generic mechanism, not something limited to just this use case. And ideally, if we do decide to have some sort of "index middleware" concept, I'd hope that it would be designed in such a way that it could be standardised so that any tool could use it, not just pip. |
I don't think this is true, for example if Pip installs from the file system how could Where as if this feature is implemented into Pip several choices can be made, abort installation, work off package metadata, file timestamps, ask the user, etc.
Any installer can choose what additional features it wants to support. For example Pip supports constraints but Poetry doesn't. Poetry supports a lock file but Pip doesn't. But I beleive this feature is of sufficent use that it would be helpful in general for all of them, not just Pip.
Sorry I thought the context was obvious for being on the Pip issue tracker, I meant for Pip. And I would only look to submit it if I could implement it in a sufficently simple enough way.
Yes, I think it would be a useful feature for all package installers. I would feature request it on package installers I use, and may raise PRs there too if I get a chance. e.g. prefix-dev/rip#135 |
I'm confused. What would it even mean to install a directory on the filesystem as of a given date? Is there any use case for needing this for a file/directory on the filesystem?
I agree, which is why I think implementing it in pip rather than as a general solution (an index proxy, or some sort of "index middleware") is a bad idea.
No, it is I who should apologise, I agree it was obvious that you meant pip. I was trying to point out that if you only create a PR for pip, who will implement the functionality for other installers? Who will ensure that the behaviour is the same across all installers? As you say you would raise feature requests and possibly PRs for other installers, I have my answer on those questions. But I still think it's a significant duplication of effort (both design and coding effort) to expect every project to implement this independently. |
I'm sure there are lots of use cases. I've certainly worked in a team that distributed all their packages over an NFS mount and pointed the package installers (conda and pip) to install them directly off that.
I guess I can't sufficently envision what that would look like in a practical way. But I'm certainly willing to back some proposal or project once it exists.
I just took this for the nature of multiple implementations based around core standards. But happy for this to be implemented an easier way. |
So were you using
Well, at the most basic level,
The "middleware" idea is speculative and I don't have a specific idea in mind, but maybe some sort of config file that tells an installer that if they are asked to use index X, they run a particular proxy and use the proxy instead of X. That would be very simple, and essentially just avoids the need for the user to manually start the proxy.
To an extent, I guess it is. But there's no standard here, just a feature request that could be designed differently in different tools. But it's not that important - we're just elaborating why we have different views, I don't think either of us is persuading the other. If you want to know whether the pip maintainers support this idea, you have the information that at least one of us doesn't, I'll leave you to decide if that affects your view, or if you want to wait and see if any of the other maintainers has a different view. |
Small update for anyone who needs this feature, uv has a "pip-like" interface and has a private option that does this: astral-sh/uv#1358 (comment) |
Consolidating this into #6257, which is an older issue with the same request. |
What's the problem this feature will solve?
It is often the case that python package updates will break old code. This is particularly problematic in science, where computational research ends up being irreproducible because of updates to the packages used. See:
https://arxiv.org/pdf/2209.04308.pdf
The basic problem is that at the moment you are writing code, you don't know what the future will bring. Ideally, packages would be backwards-compatible, but this is often not the case. In addition, even fixing all the packages you are using directly to a particular version won't help if one of those packages contains a dependency that itself can be updated in the future, breaking that package and/or your code.
Describe the solution you'd like
The problem found in the paper (plus everyone's general problems with 'X update broke Y') could be solved by pip directly by adding an optional flag with a date, beyond which no package should be installed. This moves a complicated dependency problem away from the user to a relatively simple script: pip already knows when any given package was uploaded, and could update packages and solve dependency issues exactly as it would have on the flagged date, before newer options were available.
In a sense, this would serve as a sort of poor man's docker image: a frozen snapshot of a python environment that worked at one point, and could be re-created on the fly.
Alternative Solutions
The only other way to ensure future compatibility is to freeze all packages used by your python script with explicit versions in the requirements.txt files (or equivalent), additionally freezing all packages that those packages depend on to their versions, recursively, as well. This is possible, but beyond the scope of many people's ability to program in python, especially given that python is particularly easy to use, and doesn't typically require a lot of esoteric programming knowledge. 'Flag every dependency to the particular version you're currently using' is much more complicated to even understand let alone accomplish, than a simple 'flag this as having worked on a particular day'.
Additional context
Another advantage of having a pip install that works by date instead of only version number is that it then becomes another tool a programmer can use to discover why their script no longer works. By updating the date to discover the time when their script stopped working, they know exactly which package changed to break their code, improving their ability to update that code to match the updated packages.
Code of Conduct
The text was updated successfully, but these errors were encountered: