-
-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What to do with "pure" pypi packages? #28
Comments
I am not so sure this is as ugly as a before [citation needed]. The integration with pip has improved a lot. (Even Linux distros are now allowing a mixture of PyPI and system packages!) Maybe we are just missing something like: conda install --from pypi <some package> |
IMO: all dependencies of a included conda package should be packaged as conda packages. Whatever the user wants to install on top of that is available via |
Continuum has some old/stale code around this. Someone did this as a side project and built a very large percentage of packages on PyPI. I don't know what happened to that experiment or if it is current or clean enough for it to be useful to you. |
Thanks @bkreider, I have inherited that code. It is at https://github.com/ContinuumIO/pypi-conda-builds It's not usable right out of the box, but I'm working on it when I have time. |
Cool, thanks! Personally, as was discussed on another thread somewhere.... I find that recipes all too often require some tweaking. So I think a two-part sytem would be best:
Now to find the time to work on such a thing... |
Not sure if I brought this to your attention, @takluyver, but I'm doing it now if I haven't already. Otherwise sorry for the noise. |
Thanks @jakirkham I've recently created an experimental tool to turn wheels of pure Python packages into conda packages: wheel2conda (inventive name, I know ;-). As more and more Python packages are made available as wheels, I think this could be a very quick, dependable way to automate making conda packages for the vast majority of packages that are straightforward. It's new and experimental just now, but please do kick the tyres. I'll be joining the video meeting on Friday to talk more about this. |
This again is from another thread, but I wanted to promote it to the relevant issue. I have extended my thoughts to be a little bit clearer about the problems that I see and how we can address them. The ProblemsThe idea here is that it is too tricky to determine if something is pure Python (particularly in any automatic way). For that matter, it is too tricky to determine when a pure Python package becomes less pure (includes some C code for instance). While there is metadata that can be used to specify this information, it happens to often that this metadata is simply inaccurate. We care about this as we want to control the environment that compiled code is build and want to avoid shipping unverified binary bits because it ruins the quality of the ecosystem and could make users vulnerable to problems. So, however we automate this, we need to keep this problem in mind. Somewhat orthogonal, but there are a number of other use cases R, Perl, Lua, GitHub repos, etc. where this same functionality would be nice ( see this issue #51 ). As more languages enter the scene (as I am sure they will), it would be nice to continue extending this functionality to them. Having something that gets too specialized for the PyPI case misses the fact that conda is becoming more general purpose than its Python beginnings would lead one to believe. So, this is another problem to keep in mind. Returning to the main point, it would be nice to have a solution that is not too complex (or different) and leverages the full bandwidth of our CIs. The few cases where we have The ProposalTo me, the simplest way forward that addresses all of these concerns is to have normal feedstocks for PyPI packages, but have updates for them managed in an automatic fashion. To allow for the automation, we can have a special maintainer like Addressing the ProblemsBy using this proposal, we no longer have to worry about when a package has C or other compiled code added to it. Those packages can still be maintained automatically in this system. However, if we chose not to maintain them this way we don't have to. The proposal here doesn't do anything special for PyPI (other than whatever version scraping script is used). This way, we no longer need to worry about when a Python package adds some C code. It can still be automatically maintained just the same. 😄 If we ever run into problems with a feedstock, we can do manual maintenance at any time. We can also easily extend this model to other languages. All that really changes is that we add new scraping scripts and we may discover there is much in common between them that can be reused. By keeping feedstocks for automatic maintenance, we can benefit from our existing work, fix problems manually as they arise (either as a one off or disabling automated maintenance), keep the full bandwidth of our CIs for building packages (so we can scale appropriately), avoid catastrophic breakdowns of the automated architecture from affecting the builds of individual packages (reducing stress level for everyone involved 😄), etc. In short, by using our existing infrastructure with a few minor additions, we can already benefit greatly and get all the things that we want without so many concerns. |
This is a key advantage to using wheels. The wheel tag embeds the Python ABI and platform it is for, so if it looks like What you're saying makes sense, but it feels a little bit like an old joke about mathematicians:
It feels like massive overkill to maintain a separate 'feedstock' repo for every trivial PyPI package, a bot that updates them all, and a build infrastructure spanning three separate CI services. For these cases, it's ultimately just unpacking one archive, moving some files around, and repacking into another archive with a bit of metadata. |
@takluyver I am with you there if we could just What @jakirkham proposes is a re-packaging that does make sense with our current tools. Mostly because it allows us to control how we package and write the metadata. Take the |
I think my very loose analogy may have come across as too detailed. I was just saying that it seems like a massively complex and roundabout way to achieve something that should be quite simple.
Well, |
But why? Just so that users don't need to touch pip as an alternative? That would only work with the wheel2conda-as-a-service. Right now, this simple transfers the following
into
IMO the big advantage of a "real" package is that it is (or at least can be) integrated into the system of other conda packages, starting that installing it installs all the right dependencies so that it works when it installed. This is similar to debian and the debian packaging policy (which is IMO the real USP of debian...). A Example: If one has to build a wheel2conda package, a pypi2conda-recipe is more or less the same for the trivial case of a pure python package. Both need a database of pypi names to conda names (to get dependencies right even in cases where these two names are not the same!) and the rest is just parsing PyPI data... So:
+1 |
Right, either way we need some way to map PyPI dependency names to conda dependency names. That's orthogonal to the question I'm talking about, which is how you turn packages on PyPI into conda packages. |
As far as paring PyPI data, @183amir has really done a nice job here with some scripts. It is really designed to automate this exact sort of thing. ) he has written. It would be nice to find a proper home for them here and get them cleaned up with tests and all that fun stuff (with his permission of course 😉). |
I was thinking of putting it in |
That could be a possibility. Thoughts, @pelson? |
Part of the initial problem posed by @ChrisBarker-NOAA as I understand it is having to also package any PyPI dependencies of a conda recipe. Wouldn't this be helped if conda recipes would allow specifying dependencies from PyPI? See also conda/conda-build#548. |
Now that this has been revived -- where is conda at with platform-independent packages? That would make it easier to package up pure-python packages. BTW -- for pure python, I don't see the advantage of making a conda package from a wheel -- it's just as easy to make one from source. And if it's not pure-python (which is where wheels shine), then the wheel is all too likely to be incompatible with the rest of your conda system. |
For reference, in |
Now that Travis OSX builds are waiting in the queue for many hours to complete a build, I feel like reminding people that wheel2conda can convert a pure-Python wheel to a set of conda packages on a single system in a few seconds. It's at a prototype stage at the moment, and it would need more work to turn it into a complete solution, but if we're routinely going to be waiting hours for an OSX buildbot, building OSX packages from Linux seems rather attractive. And if we could do this for all the pure Python packages, it could free up conda-forge's ration of OSX buildbots to build more complex packages. |
Regardless of how this problem is approached, having some way of getting info about an update to a package from PyPI is going to be very important. I raised an issue with PyPA a few months back about getting notifications for package updates. Please chime in if you have thoughts on how this might be done or feel free to simply show support. Issue link is below. xref: pypi/warehouse#1683 |
Has there been any movement on this? It's still a bit of a pain to deal with installing packages that exist on PyPI but don't have a corresponding conda package. The problem with something like |
Hey @BrenBarn! (Good to hear from you, BTW!) I don't think that what you want exists as a tool yet. Mostly because no one has worked on it. Part of the deal with conda-forge is that we are a curated set of packages with a certain communally governed quality to them. This is pretty different from the PyPI model that lets anyone push any package with out even modest checks to ensure quality (ie "Does this package even install?"). Recently @marcelotrevisani developed Grayskull (https://conda-forge.org/blog/posts/2020-03-05-grayskull/), which helps convert PyPI packages to recipes. Lots of folks have started to use it to submit to staged recipes. This could be used as the basis for a tool that installs from either conda (if available) or builds a conda package from the PyPI version and then installs it. |
On Fri, Aug 21, 2020 at 2:20 PM Anthony Scopatz <[email protected]>
wrote
I don't think that what you want exists as a tool yet. Mostly because no
one has worked on it. Part of the deal with conda-forge is that we are a
curated set of packages with a certain communally governed quality to them.
This is pretty different from the PyPI model that lets anyone push any
package with out even modest checks to ensure quality (ie "Does this
package even install?").
This indeed, is a core difference. And why we don't want to
just auto-populate conda-forge with packages from pypi.
Recently @marcelotrevisani <https://github.com/marcelotrevisani>
developed Grayskull (
https://conda-forge.org/blog/posts/2020-03-05-grayskull/), which helps
convert PyPI packages to recipes. Lots of folks have started to use it to
submit to staged recipes.
Indeed, as it gets easier and easier to make conda-forge recipes, the
number of PyPi packages that aren't supported gets smaller and smaller.
But another difference between conda-forge and pypi is that conda is not
only about Python packages -- so you may have the same paaakge name for a
PyPi package, or a R package, or a C lib, or what have you.
So what I think would be really useful is a "dynamic" channel: call it
something like "conda-pypi" -- when it was searched for a package, it would
reach out to PyPi, and try to find it, and if it did it would auto-build a
conda package out of it and deliver that. And then cack it for the next
request.
Now that I think about it, that may not be possible, 'cause conda expects a
channel to have a pre-built list of available packages. But it could
populate that list from PyPi, and still only build the package on demand --
and, when there was a failure, keep track and not try again (until the
package was updated on PyPi anyway).
But someone would need to build this nifty system, and given the advantages
of curation, maybe putting a recipe on conda-forge is a better vet anyway.
note that while there are thousands (hundreds of thousands!) of packages on
PyPi that aren't on conda-forge, most of them are not really useful --
unfortunately, PyPi kin dof encourages people to put any old prototype or
may-be-useful-someday pacakge up there, and there are a LOT of those!
…-CHB
This could be used as the basis for a tool that installs from either conda
(if available) or builds a conda package from the PyPI version and then
installs it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG7YYE4YFL2TUQ5NJIYMG3SB3QLFANCNFSM4B3LF3XA>
.
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
[email protected]
|
There are a lot of python pacakges that "jsut work" with plain old:
pip install the_package
These are pure python packages with non-complex dependencies.
So some folks use a mixture of conda and pip to install stuff, but this gets ugly with the dependency resolution, etc.
I've dealt with this so far by making conda packages for these, but there are a LOT of them -- and as this is an easy lift, it would be do-able to automate it all. I've always thought that Anaconda-org should have a PyPi bridge -- someone looks for a package, it's not there, it looks for a pypi package and builds a conda pacakge on the fly and away we go!
But that wold require Continuum to do it, and maybe would be too much magic.
But maybe we could have a set of conda packages that are auto-built from PyPi (conda skeleton mostly works) and then have an automated system that goes through and looks to see if there are newer versions of any of them, and auto-update those. So in theory, all we'd need to do by hand was keep a list of packages to monitor (probably keep up with whether it had been added to the default channel).
I started down this track before I discovered obvious-ci -- running conda skeleton, and building the package on the fly. Then I decided that it was easier to simply maintain by hand the half a dozen packages I needed. But it would be nice to cover a much larger range of packages....
Thoughts?
The text was updated successfully, but these errors were encountered: