Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shelling out to pip during tests is problematic #2950

Closed
dvzrv opened this issue Dec 31, 2023 · 9 comments
Closed

Shelling out to pip during tests is problematic #2950

dvzrv opened this issue Dec 31, 2023 · 9 comments
Labels
bug fix developed release schedule to be determined

Comments

@dvzrv
Copy link

dvzrv commented Dec 31, 2023

Description of the bug

Hi! 👋

While trying to package 1.23.8 for Arch Linux I ran into an issue with a new test:
It appears during test run the test shells out to calling pip to install pymupdf-fonts.

This is problematic, as for packaging purposes we want to know what is required up front and actually be able to disable internet access during build and test run.

If pymupdf-fonts is a test requirement it should be noted as such, and not be installed on the fly, as that arbitrarely alters the test environment.

Relatedly, why are fonts not loaded via system integration? Downstream distributions usually package all sorts of fonts and there is usually no need to bundle them in this special way (and using something generic would allow using all sorts of fonts and not just the bundled ones).

How to reproduce the bug

The build script for the previous version can be found here:
https://gitlab.archlinux.org/archlinux/packaging/packages/python-pymupdf/-/blob/07162b38e509ca0d9f107af94af4f88e74faa887/PKGBUILD

=================================== FAILURES ===================================
_______________________________ test_fontarchive _______________________________

    def test_fontarchive():
        import subprocess
>       subprocess.run('pip install pymupdf-fonts', shell=1, check=1)

tests/test_font.py:63:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

input = None, capture_output = False, timeout = None, check = 1
popenargs = ('pip install pymupdf-fonts',), kwargs = {'shell': 1}
process = <Popen: returncode: 127 args: 'pip install pymupdf-fonts'>
stdout = None, stderr = None, retcode = 127

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.

        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them,
        or pass capture_output=True to capture both.

        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.

        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.

        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.

        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.

        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE

        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE

        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command 'pip install pymupdf-fonts' returned non-zero exit status 127.

/usr/lib/python3.11/subprocess.py:571: CalledProcessError
----------------------------- Captured stderr call -----------------------------
/bin/sh: line 1: pip: command not found
=========================== short test summary info ============================
FAILED tests/test_font.py::test_fontarchive - subprocess.CalledProcessError: ...
================= 1 failed, 174 passed, 1 deselected in 9.33s ==================

PyMuPDF version

1.23.8

Operating system

Linux

Python version

3.11

@julian-smith-artifex-com
Copy link
Collaborator

Thanks for raising this.

In PyMuPDF the list of required test packages is currently available in scripts/gh_release.py:test_packages, but this is currently for internal use, it may change, and it is not documented. It's currently set to pytest fontTools psutil pymupdf-fonts pillow.

[It's a shame that pytest doesn't allow specification of required packages in pytest.ini.]

I'll have a look at adding a page to the docs with information for packagers; i'll leave this issue open for now and request a review here when i have something ready.

Regarding use of system fonts, i'll defer to @JorjMcKie.

@JorjMcKie
Copy link
Collaborator

Relatedly, why are fonts not loaded via system integration? Downstream distributions usually package all sorts of fonts and there is usually no need to bundle them in this special way (and using something generic would allow using all sorts of fonts and not just the bundled ones).

MuuPDF, and hence neither PyMuPDF accesses system fonts of the respective platform. All fonts must either be built into the (MuPDF) binary, or provided when needed via either separate font files or in-memory binaries.

Package pymupdf-fonts - if installed - provides certain selected fonts in-memory via its own methods - which is required in exactly this way for some PyMuPDF functions, specifically when using the Story feature.
So in order to test these functions, the pymupdf-fonts package must be properly installed.

@julian-smith-artifex-com
Copy link
Collaborator

Please see new discussion about draft documentation for Linux system installs: #2977

@dvzrv
Copy link
Author

dvzrv commented Jan 4, 2024

Package pymupdf-fonts - if installed - provides certain selected fonts in-memory via its own methods - which is required in exactly this way for some PyMuPDF functions, specifically when using the Story feature.
So in order to test these functions, the pymupdf-fonts package must be properly installed.

Thanks for clarifying the use of fonts in pymupdf-fonts.
This does not really answer my question in regards to why system fonts are not used though. Does pymupdf-fonts exist to circumvent the time required to load a font?
I'm asking, because when packaging, I want to circumvent packaging things that we already have (aka many of those fonts).
Additionally, vendored font files make it harder for me to figure out the licensing situation of those files:

  • not providing the Ubuntu Font License file is problematic, as I would need to package that due to it being a custom license identifier, not covered by SPDX
  • all OFL-1.1 license files need to be provided for the individual fonts, as each file carries unique copyright holder information

@julian-smith-artifex-com
Copy link
Collaborator

In my tree i have removed the pip install commands in tests (there were two), and added a missing flake8 to scripts/gh_release.py:test_packages.

So tests should now work offline as long as the specified packages are pre-installed.

@julian-smith-artifex-com julian-smith-artifex-com added bug fix developed release schedule to be determined labels Jan 4, 2024
julian-smith-artifex-com added a commit to ArtifexSoftware/PyMuPDF-julian that referenced this issue Jan 5, 2024
Instead have added missing flake8 to scripts/gh_release.py:test_packages, so
all required packages will already be available, e.g. when tests are run by
`scripts/test.py`.

This addresses pymupdf#2950 "Shelling out to pip during tests is problematic".
julian-smith-artifex-com added a commit that referenced this issue Jan 5, 2024
Instead have added missing flake8 to scripts/gh_release.py:test_packages, so
all required packages will already be available, e.g. when tests are run by
`scripts/test.py`.

This addresses #2950 "Shelling out to pip during tests is problematic".
@JorjMcKie
Copy link
Collaborator

This does not really answer my question in regards to why system fonts are not used though. Does pymupdf-fonts exist to circumvent the time required to load a font?

MuPDF (and hence PyMuPDF) do not access the system fonts of the respective machine because there is no standard based on which fonts can be assumed to be available. Certainly not across different operating systems, but also not even within one platform like Windows or Linux.
Trying to dive into this would lead to a never-ending story and lots of issues and complaints.

So, MuPDF is providing function hooks that can be used by an installation / package providers to present a defined set of system fonts to MuPDF's algorithms.
These hooks are described here .../mupdf/include/mupdf/fitz/font.h. Via defining functions for

  • fz_load_system_font_fn
  • fz_load_system_cjk_font_fn
  • fz_load_system_fallback_font_fn

... and

void fz_install_load_system_font_funcs(fz_context *ctx,
	fz_load_system_font_fn *f,
	fz_load_system_cjk_font_fn *f_cjk,
	fz_load_system_fallback_font_fn *f_fallback);

an installation can make sure that MuPDF becomes aware of local fonts and includes them in its lookup algorithms.

In PyMuPDF, a number of fonts are always available because they are compiled with the MuPDF binary (e.g. the Base-14 fonts, the "Droid Sans Fallback Regular" font for CJK and a few select other fonts like some Noto).

Package pymupdf-fonts contains a dozen more fonts independent from the above, which the authors have come across over time, based on whatever motivations. This package is a convenient way to make available fonts to PyMuPDF without the need to re-compile anything or enlarging PyMuPDF's binaries and not even the memory footprint of PyMuPDF applications.
We also use these fonts as a test bed for loading fonts from memory like I mentioned for features using the Story class.

@julian-smith-artifex-com
Copy link
Collaborator

Marking this as fixed in next release, because we no longer run pip install in tests.

The font problem remains, but that should really be in a separate issue.

@julian-smith-artifex-com
Copy link
Collaborator

This is fixed in release candidate 1.23.9rc2; see: #2996

@julian-smith-artifex-com
Copy link
Collaborator

Fixed in 1.23.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix developed release schedule to be determined
Projects
None yet
Development

No branches or pull requests

3 participants