[conda] Unable to make a conda build #113

fg-mindee · 2021-03-07T18:30:36Z

Unfortunately, one of the project dependencies does not have any conda release or any way to make one. I opened an issue on their repo pymupdf/PyMuPDF#938 to track this, but so far I haven't found any way to release the project on anaconda with this dependency.

charlesmindee · 2021-03-08T08:05:46Z

Is it mandatory to support conda? If so, maybe we can switch to another pdf-reader lib.

fg-mindee · 2021-03-08T08:35:02Z

Not mandatory but this is a very common installation mean for python package. We might investigate other options to replace the dependency but we'll have to check for performance drop first

fg-mindee · 2021-07-12T14:26:01Z

For reference, my initial issue on PyMuPDF (pymupdf/PyMuPDF#938) was moved to this discussion: pymupdf/PyMuPDF#1137

kchawla-pi · 2021-09-14T16:37:21Z

Why not simply do a

conda run pip install <missing in conda package>

Especially since there's is only one package

fg-mindee · 2021-09-15T13:37:40Z

Hi @kchawla-pi,

So actually since then, there is also weasyprint that is missing a conda build. But it happens that I was thinking about getting back to the bottom of this yesterday. Worst case scenario, we'll make some features optional (such as HTML compatibility through weasyprint) so that the core build is available in conda.

Also please note that for now, the only important dependencies that would benefit from a conda support (performance-wise) are PyTorch & TensorFlow 👍

Anyway, we'll provide some updates on this very topic soon!

charlesmindee · 2022-02-24T09:55:27Z

So now with #829 we are just missing weasyprint, right @fg-mindee ?

fg-mindee · 2022-02-24T11:26:20Z

So now with #829 we are just missing weasyprint, right @fg-mindee ?

Nope, pypdfium2 also lacks support of a conda installation. But that could be fixed, I'll ping them about this!
However, having doctr.io.pdf and doctr.io.html as extras, would do the trick 👍

And I think we should seriously consider that: especially for HTML, it's more about people in need of training data, so I would argue that most users don't benefit from weasyprint (which is a problem for MAC users also #815)

For PDFs, it's more important, so if we can get a conda build, our best course of action would probably be to move html/weasyprint to an extra! What do you think?

fg-mindee · 2022-02-24T11:33:31Z

I just checked and weasyprint does have a conda build now 🙌
https://anaconda.org/conda-forge/weasyprint

(But I still think we should move it to extra builds)

mara004 · 2022-05-18T13:24:07Z

Sorry about the conda build - I never used conda myself and currently don't have the time/interest to learn it. Due to platform-specific binaries, the setup infrastructure of pypdfium2 is fairly complex already.
Perhaps a developer who is more familiar with conda can look into this at some point. I'd be happy to take a Pull Request that adds conda packaging to the release workflow.
That said, is there any reason you can't use pip?

felixdittrich92 · 2022-05-20T22:43:16Z

@frgfm

frgfm · 2022-05-22T20:52:03Z

@mara004 what do you mean by "any reason you can't use pip?"

pip installation is already available 👍
but conda builds are more specific to a given environment, so it's good if we can offer that mean of installation as well. For the conda recipe, I don't know about options to use pip (I don't have experience with conda recipe building the C or C++ extensions of a python library though)

mara004 · 2022-05-22T21:41:30Z

what do you mean by "any reason you can't use pip?"

I'm not familiar with the conda environment, so perhaps that was a silly question to ask.
I basically meant: For what reason do we need an extra package on conda if the PyPI release can be used?
As @kchawla-pi wrote:

Why not simply do a

conda run pip install <missing in conda package>

but conda builds are more specific to a given environment

I'd be curious to know in what way exactly conda builds are more specific?

I have read the comparison of conda to pip in Wikipedia, but the problem specified there can be solved with venv. pip allows dependency breakage, but very clearly warns about it, so I don't really see an issue in this regard...

kchawla-pi · 2022-05-22T22:42:18Z

Well, pip does not do sophisticated dependency resolution, unlike Conda. It's the same reason pipenv and poetry are used for package installations, but unlike Conda, they use PyPI's index. Each of these has their own algorithm for dependency resolution, with Pipenv being rather slow.

Conda is the defacto tool for data scientists in the Python ecosystem. Seamlessly using Mindee packages using Conda will solve a big paper cut.

mara004 · 2022-05-23T09:43:17Z

Okay, thanks for pointing this out!
To me personally, conda still seems kind of a reinvented wheel and duplicated packaging work, but if there are people who like it and use it I'm open to add support if someone can implement it properly.

frgfm · 2022-05-25T18:55:43Z

I can definitely second @kchawla-pi on that: I always try to find a conda installation before using pip, because it's much more careful about your existing env compatibility 👍

mara004 · 2022-05-25T19:24:15Z

I tried to craft a package with conda-build recently but I'm afraid it didn't go very well at all. I managed to build a package for my host platform (Linux x86_64) but it took unendurably long for conda-build to set up the environment and assemble the package (and while doing so, the directory where I installed miniconda grew well above 3 GiB 🙄). I hope there are ways to speed up the process of running conda-build...

kchawla-pi · 2022-05-25T20:30:28Z

Wow that must be so frustrating . I don't know about Conda packaging, but now I'm pissed at conda for making your job so difficult. I will try to take a gander at it in June.

mara004 · 2022-05-25T20:33:05Z

Well, I don't know, perhaps I was just doing it the wrong way, but all the same it hasn't been very obvious to me how to do it.

frgfm · 2022-05-25T22:10:47Z

In my experience conda build is always a long operation. Base conda is known to have a slow dep resolution procedure, so I personally use mamba (https://github.com/mamba-org/mamba) which is blazing fast for dep installation (multi-thread, rewritten in C++). I have to check if that extends to package building as well

mara004 · 2022-05-26T11:55:31Z

I think the main problem is that, when running conda-build, it creates an isolated environment where all dependencies are installed. Now, if we want to craft more than one package, it would be essential that the environment can be reused so that dependencies don't need to be installed each time. Is there any option to do this?

felixdittrich92 · 2022-09-02T11:20:20Z

@frgfm do you know an answer ? 😅

mara004 · 2022-09-02T11:52:37Z

Even if we can get around the duration problem, I'll still need information about conda platform tags. We need an equivalent for each of the tags shown on https://pypi.org/project/pypdfium2/#files (section "Built Distributions").
Alternatively, perhaps a conda package could just wrap pip install somehow?
The easiest case would be if there were some tool to automatically convert wheels to conda packages, but I doubt this exists.

mara004 · 2022-09-02T12:21:12Z

For reference, these two pages sound interesting:
https://docs.conda.io/projects/conda-build/en/latest/user-guide/recipes/build-without-recipe.html
https://docs.conda.io/projects/conda-build/en/latest/user-guide/wheel-files.html

mara004 · 2023-08-19T20:16:31Z

Since it looks like the packages generated from pypdfium2-feedstock will not be made public (cf. AnacondaRecipes/pypdfium2-feedstock#1 (comment)), I will make a second attempt at building official conda packages for pypdfium2 in a conda branch, trying to accept or work around the python version problem (it remains to be decided how).

pypdfium2-feedstock currently requires manual interaction and native hosts.¹ I want to design this differently so we can build automatically in a workflow and without native hosts.

This is only possible due to anaconda's extended CI capabilities, and might still end up not supporting some platforms we technically have cross-compiled binaries for. ↩

mara004 · 2023-08-20T13:13:51Z

Ok, so I think I have the local packaging part ready. It's really inelegant, but all I could do given conda's limitations.

Now the remaining parts we need are

CI integration
- build in parallel for the multiple python versions (I'd suggest 3.8 through 3.11)
- upload, supposedly to anaconda? (help wanted)
People who can test the built packages

Here's an archive of builds for python 3.11 which I generated locally: pypdfium2_conda_py311.zip
Also attaching a patch snapshot of the branch: pypdfium2_conda.patch.txt

Note that the packages will contain wrong __pycache__ files because they are not built natively, but I hope python will just regenerate them locally. (conda really should not bundle pycache in the first place...)

(PS: @kchawla-pi, now you can take a look at the code if you like ;) )

mara004 · 2023-08-20T23:55:24Z

Oh, and I just discovered conda's --variants feature - we can pass {python: [3.8, 3.9, 3.10, 3.11]} to build for multiple python versions. That doesn't really fix the problem (we still end up with separate packages although not logically necessary), but it makes it easier to accommodate, and hints that there may be at least some upstream recognition of the problem.

felixT2K · 2023-08-21T07:07:06Z

Hi @mara004
Thanks for the updates 👍

About uploading this seems not to be so complicated: https://levelup.gitconnected.com/publishing-your-python-package-on-conda-and-conda-forge-309a405740cf (manual upload)

and with CI: (as example from @frgfm 's holocron lib) 😅
https://github.com/frgfm/Holocron/blob/f78c6c58c0007e3d892fcaa1f1ff786cdbb5195f/.github/workflows/release.yml#L58
https://github.com/frgfm/Holocron/tree/main/.conda

Maybe @frgfm can help a bit more :)

mara004 · 2023-08-21T11:00:20Z

Thanks, sorry for spamming this thread.

The performance difference is heavy, though. Building all wheels takes ~20s on my device. Contrast this to conda builds which take, like, over 15min.¹
(Also the conda packages are 100MiB compared to 30MiB for wheels, which is because of the python version splitting.)

I've got a feeling I'm missing something here, but if that were true it's not obvious how to do it properly.
Please tell me if anyone knows how to speed this up (apart from CI parallelization), or else how to improve it (like disabling pycache compilation).

actually 2 platforms less because conda does not support musllinux ... ↩

mara004 · 2023-08-21T19:09:32Z

Throwback, @boldorider4 just gave me an eye opener that pdfium should be packaged separately in conda so pypdfium2 can just depend on it and cleanly be noarch. I'm still thinking about this but believe it may finally be the clean solution I was looking for.

Ideally the conda packaging would be done in pdfium-binaries (will still need some conda convert for the cross compiled archs, but much easier). Then what we need in pypdfium2 is to instruct the library loader with the right path, and of course a noarch conda recipe.

This should really have come to my mind earlier. Especially I should have realized after a recent discussion with @KOLANICH about pdfbox, just failed to connect it.

Phew, I need a break before revisiting this 😅

felixT2K · 2023-08-22T06:24:19Z

@mara004 fyi there is also a draft for conda-forge channel:

conda-forge/staged-recipes#23726

See also mindee/doctr#113 (comment) AnacondaRecipes/pypdfium2-feedstock#1 (comment) and

mara004 · 2023-08-22T13:56:18Z

@mara004 fyi there is also a draft for conda-forge channel:
conda-forge/staged-recipes#23726

Thanks for the pointer, see my comment conda-forge/staged-recipes#23726 (comment).

felixT2K · 2023-10-12T14:23:56Z

@mara004 I wanted to ask if there are any updates on your site ? :)

mara004 · 2023-10-12T18:48:34Z

@mara004 I wanted to ask if there are any updates on your site ? :)

I've got it on my mind and have been working on some integration prerequisites to get this done nicely - packaging with an external library differs quite a bit from bundling. I can elaborate on the individual tasks if necessary.
The thing is, my personal situation is rather difficult, but provided it doesn't get worse I should hopefully be able to finish this well before year's end.

felixT2K · 2023-10-13T06:17:10Z

@mara004 I wanted to ask if there are any updates on your site ? :)

I've got it on my mind and have been working on some integration prerequisites to get this done nicely - packaging with an external library differs quite a bit from bundling. I can elaborate on the individual tasks if necessary. The thing is, my personal situation is rather difficult, but provided it doesn't get worse I should hopefully be able to finish this well before year's end.

Oh yeah, no stress, I just wanted to ask so I can plan for it. :)

mara004 · 2023-10-23T22:57:42Z

Work in progress: https://github.com/pypdfium2-team/pypdfium2/pull/268/files
I believe the packaging is nearly done, we just need some cleanup, testing, docs and the CI integration now.

mara004 · 2023-10-24T12:29:55Z

However, we might have a bit of a problem with the custom channels.
According to conda/conda-build#532 (comment), it looks like conda-build might not properly support them in recipes?

In that case, users would have to add the channels explicitly before installation, which is probably doable, but not nice.
Especially we need to be careful with pdfium-binaries, because there's an improper package in anaconda/main (official channel's bblanchon), which adds to the confusion.

mara004 · 2023-10-30T15:27:25Z

Just merged the conda packaging code: pypdfium2-team/pypdfium2@ee5a2ff.
The packages build locally, but I haven't done the CI integration yet.

mara004 · 2023-10-31T18:42:38Z

And the CI/docs also merged now: pypdfium2-team/pypdfium2#269

mara004 · 2023-10-31T20:30:56Z

https://anaconda.org/pypdfium2-team/pypdfium2_helpers
https://github.com/pypdfium2-team/pypdfium2#install-conda

felixT2K · 2023-11-01T07:13:01Z

Thanks a lot @mara004 👍🏼

frgfm · 2023-12-03T20:24:52Z

mara004 · 2023-12-04T14:30:39Z

I imagine carrying around the custom channels for pypdfium2 (pypdfium2-team, bblanchon) might be a bit of an annoyance...

Despite the channel::pkg syntax, end users still have to activate the channel manually. Conda does not automatically resolve/activate dependency channels, nor is there a recipe section to specify channels to enable (conda/conda-build#532).

I'm kind of wondering if we might have gone the wrong way and should have tried putting pypdfium2 and dependencies in conda-forge instead, but the feedstock publishing seemed less flexible and I wasn't sure how to automate it. However, if anyone wants to pursue that path, the feedstocks written by Anaconda Team (pdfium-binaries, ctypesgen [pypdfium2-team fork]) might be a good starting point. Though I'd recommend not to use their pypdfium2-feedstock, but split in separate pypdfium2_raw and pypdfium2_helpers packages as we do in the custom channel.

It would be most convenient if conda-forge as a community channel could just "include" or mirror the pypdfium2-team and bblanchon channels, but I don't think they can do this.

Anyway, unfortunately my time budget for conda is more than over, so I won't be able to look into this any deeper 😅

mara004 · 2024-08-16T00:17:52Z

FWIW, someone has put pypdfium2 in conda-forge now, but badly.
As of this writing, they only support osx-64 and macos-64, and since they're bundling the binaries, they have to build python version specific packages, unnecessarily.
Again, it's tied to native hosts, and as such will always lack support for architectures not provided by the feedstock infrastructure.

So, please continue using our packages from bblanchon and pypdfium2-team channels.

conda-forge links:
https://anaconda.org/conda-forge/pypdfium2
https://anaconda.org/conda-forge/pypdfium2/files
https://anaconda.org/conda-forge/ctypesgen-pypdfium2-team

felixdittrich92 · 2024-08-16T10:48:28Z

Thanks for the update 👍

fg-mindee added type: bug Something isn't working topic: build Related to dependencies and build labels Mar 7, 2021

fg-mindee added this to the 1.0.0 milestone Mar 7, 2021

fg-mindee self-assigned this Mar 7, 2021

This was referenced Mar 10, 2021

[documents] Benchmark PDF document reading + numpy conversion options #23

Closed

chore: Fixed CI pypi automatic publish upon release #153

Merged

charlesmindee mentioned this issue Feb 15, 2022

feat: replace PyMuPDF by pdf2image, for license compatibility #818

Closed

mara004 mentioned this issue Aug 21, 2023

Conda pypdfium2-team/pypdfium2#248

Closed

mara004 added a commit to pypdfium2-team/pypdfium2 that referenced this issue Aug 22, 2023

Add note about this branch and idea for a cleaner approach

76adbc2

See also mindee/doctr#113 (comment) AnacondaRecipes/pypdfium2-feedstock#1 (comment) and

mara004 added a commit to pypdfium2-team/pypdfium2 that referenced this issue Aug 22, 2023

Add note about this branch and idea for a cleaner approach

2d78bd2

See also mindee/doctr#113 (comment) AnacondaRecipes/pypdfium2-feedstock#1 (comment) and

mara004 added a commit to pypdfium2-team/pypdfium2 that referenced this issue Aug 22, 2023

Add note about this branch and idea for a cleaner approach

9345764

See also mindee/doctr#113 (comment) AnacondaRecipes/pypdfium2-feedstock#1 (comment) and

mara004 added a commit to pypdfium2-team/pypdfium2 that referenced this issue Aug 22, 2023

Add note about this branch and idea for a cleaner approach

af2ff6c

See also mindee/doctr#113 (comment) AnacondaRecipes/pypdfium2-feedstock#1 (comment) and

mara004 mentioned this issue Aug 22, 2023

Conda package bblanchon/pdfium-binaries#119

Closed

frgfm mentioned this issue Dec 22, 2023

feat: Adds conda recipe & corresponding CI jobs #1414

Merged

felixdittrich92 closed this as completed in #1414 Feb 6, 2024

felixdittrich92 removed this from the 1.0.0 milestone Oct 10, 2024

[conda] Unable to make a conda build #113

[conda] Unable to make a conda build #113

Comments

fg-mindee commented Mar 7, 2021

charlesmindee commented Mar 8, 2021

fg-mindee commented Mar 8, 2021

fg-mindee commented Jul 12, 2021

kchawla-pi commented Sep 14, 2021

fg-mindee commented Sep 15, 2021

charlesmindee commented Feb 24, 2022

fg-mindee commented Feb 24, 2022

fg-mindee commented Feb 24, 2022

mara004 commented May 18, 2022 • edited Loading

felixdittrich92 commented May 20, 2022

frgfm commented May 22, 2022

mara004 commented May 22, 2022 • edited Loading

kchawla-pi commented May 22, 2022 • edited Loading

mara004 commented May 23, 2022 • edited Loading

frgfm commented May 25, 2022

mara004 commented May 25, 2022 • edited Loading

kchawla-pi commented May 25, 2022

mara004 commented May 25, 2022

frgfm commented May 25, 2022

mara004 commented May 26, 2022

felixdittrich92 commented Sep 2, 2022 • edited Loading

mara004 commented Sep 2, 2022 • edited Loading

mara004 commented Sep 2, 2022

mara004 commented Aug 19, 2023 • edited Loading

Footnotes

mara004 commented Aug 20, 2023 • edited Loading

mara004 commented Aug 20, 2023

felixT2K commented Aug 21, 2023 • edited Loading

mara004 commented Aug 21, 2023 • edited Loading

Footnotes

mara004 commented Aug 21, 2023 • edited Loading

felixT2K commented Aug 22, 2023

mara004 commented Aug 22, 2023

felixT2K commented Oct 12, 2023

mara004 commented Oct 12, 2023 • edited Loading

felixT2K commented Oct 13, 2023

mara004 commented Oct 23, 2023 • edited Loading

mara004 commented Oct 24, 2023 • edited Loading

mara004 commented Oct 30, 2023

mara004 commented Oct 31, 2023

mara004 commented Oct 31, 2023 • edited Loading

felixT2K commented Nov 1, 2023

frgfm commented Dec 3, 2023

mara004 commented Dec 4, 2023 • edited Loading

mara004 commented Aug 16, 2024 • edited Loading

felixdittrich92 commented Aug 16, 2024

mara004 commented May 18, 2022 •

edited

Loading

mara004 commented May 22, 2022 •

edited

Loading

kchawla-pi commented May 22, 2022 •

edited

Loading

mara004 commented May 23, 2022 •

edited

Loading

mara004 commented May 25, 2022 •

edited

Loading

felixdittrich92 commented Sep 2, 2022 •

edited

Loading

mara004 commented Sep 2, 2022 •

edited

Loading

mara004 commented Aug 19, 2023 •

edited

Loading

mara004 commented Aug 20, 2023 •

edited

Loading

felixT2K commented Aug 21, 2023 •

edited

Loading

mara004 commented Aug 21, 2023 •

edited

Loading

mara004 commented Aug 21, 2023 •

edited

Loading

mara004 commented Oct 12, 2023 •

edited

Loading

mara004 commented Oct 23, 2023 •

edited

Loading

mara004 commented Oct 24, 2023 •

edited

Loading

mara004 commented Oct 31, 2023 •

edited

Loading

mara004 commented Dec 4, 2023 •

edited

Loading

mara004 commented Aug 16, 2024 •

edited

Loading