Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia devel tools get removed from rocker/cuda-based images #736

Open
javdg opened this issue Dec 8, 2023 · 14 comments
Open

Nvidia devel tools get removed from rocker/cuda-based images #736

javdg opened this issue Dec 8, 2023 · 14 comments
Labels

Comments

@javdg
Copy link

javdg commented Dec 8, 2023

Container image name

rocker/cuda

Container image digest

No response

What operating system related to this question?

Linux

System information

No response

Question

As they are based on nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 (dockerfiles/cuda_4.3.0.Dockerfile), I was expecting to have the full CUDA Toolkit devel environment available when using any of the rocker/cuda and derived images.
However, https://github.com/rocker-org/rocker-versioned2/blob/de8b815b1b23c368308cc9dc960cb8a7c724be9f/scripts/install_R_source.sh#L160C39 seems to remove relevant packages (although not explicitly specified). This includes cuda-compiler-11-8*, cuda-minimal-build-11-8*, cuda-nvcc-11-8* and marks others for subsequent autoremoval (cuda-cuxxfilt-11-8, cuda-nvprune-11-8).
This does e.g. prevent the torch package to be easily installed with GPU-support within such containers.
Is this done intentionally?

@javdg javdg added the question label Dec 8, 2023
@eitsupi
Copy link
Member

eitsupi commented Dec 8, 2023

@cboettig Any thoughts?

@benz0li
Copy link
Contributor

benz0li commented Dec 8, 2023

Interesting. This does not happen with b-data's/my CUDA-enabled JupyterLab docker stacks which are also based on nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04.

I use

apt-get -y purge PACKAGES
apt-get -y autoremove
rm -rf /var/lib/apt/lists/*

instead of

apt-get remove --purge -y PACKAGES
apt-get autoremove -y
apt-get autoclean -y
rm -rf /var/lib/apt/lists/*

👉 Yes... somehow apt-get remove --purge -y ${BUILDDEPS}; apt-get autoremove -y removes relevant CUDA packages from the Rocker images.


Another difference: b-data's/my images copy R from image glcr.b-data.ch/r/rsi instead of building R while building the docker image.
ℹ️ Python is "installed" the same way (i.e. copied from image glcr.b-data.ch/python/psi), if (a current) PYTHON_VERSION is set.

@eitsupi
Copy link
Member

eitsupi commented Dec 19, 2023

@cboettig Could you take a look at this?

@cboettig
Copy link
Member

@eitsupi thanks for the ping, yeah I'll take a look!

@cboettig
Copy link
Member

Yup,

somehow apt-get remove --purge -y ${BUILDDEPS}; apt-get autoremove -y removes relevant CUDA packages from the Rocker images.

it looks like some of the build deps are (unsurprisingly) also build deps of cuda-devel. Still seems a bit puzzling to me that it would grab some of the nvidia tools.

Removing build-deps this way in rocker/r-ver recipe is relatively dated strategy -- I believe multi-staged builds are the standard way to build images without including development dependencies. (Though that mechanism didn't exist when this build recipe was initially deployed in rocker!)

@eitsupi I'm not quite sure how best to go about setting up a staged build dockerfile in the current build system though -- thoughts on how to go about that?

Perhaps a simpler / short-term solution would be to add a build arg to suppress the build_deps removal and set that argument in the ml stack.... (given the size of the cuda libs the R build deps are mostly already included or won't add much more to the image size I think).

@eitsupi
Copy link
Member

eitsupi commented Dec 21, 2023

I don't think there's a problem with doing a multi-stage build because we just change the description in the Dockerfile.
It's just that I don't know the caching strategy when doing multi-stage builds. (All I know is that the inline caching we are currently doing is meaningless for multi-stage builds because it only caches the final image.)

I don't know how R on CUDA is built, but is it enough to copy R from rocker/r-ver against the cuda base image?
It seems pretty cumbersome to rewrite rocker/r-ver to build with a multi-stage build.
(At least I don't have the passion to do it)

@cboettig
Copy link
Member

@eitsupi nice, I think that's a good idea -- let's leave rocker/r-ver as is, but let's adjust the rocker-cuda recipe.

Hmm, we can copy R_HOME from rocker/r-ver instead of running install_R_source.sh, though we'll still need to install system runtime dependencies. And then there's the linking done by make install (which is no longer available since it gets cleaned up), e.g. the linking of binaries in /usr/local/bin...

@cboettig
Copy link
Member

Okay, I'm thinking something like this as the replacement rocker/cuda Dockerfile template:

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04

LABEL org.opencontainers.image.licenses="GPL-2.0-or-later" \
      org.opencontainers.image.source="https://github.com/rocker-org/rocker-versioned2" \
      org.opencontainers.image.vendor="Rocker Project" \
      org.opencontainers.image.authors="Carl Boettiger <[email protected]>"

ENV R_VERSION=4.3.2
ENV R_HOME=/usr/local/lib/R
ENV TZ=Etc/UTC
ENV NVBLAS_CONFIG_FILE=/etc/nvblas.conf
ENV PYTHON_VENV=/opt/venv
ENV PATH=${PYTHON_VENV}/bin:${R_HOME}/bin:${CUDA_HOME}/bin:${PATH}
ENV CRAN=https://p3m.dev/cran/__linux__/jammy/latest
ENV LANG=en_US.UTF-8

COPY --from=rocker/r-ver ${R_HOME} ${R_HOME}
COPY scripts /rocker_scripts

RUN /rocker_scripts/install_R_deps.sh
RUN /rocker_scripts/setup_R.sh
RUN /rocker_scripts/config_R_cuda.sh
RUN /rocker_scripts/install_python.sh

CMD ["R"]

This introduces install_R_deps.sh to the scripts directory, which is basically the runtime dependencies from install_R_source.sh and a tiny bit of config from there too. I think maybe the most notable change here is I put R_HOME/bin on the PATH, whereas make install links to /usr/local/bin. Not sure if there's anything else make install does that needs to be implemented here.

I noticed that setup_R.sh also removes some build deps that are compression libraries, maybe we should simply not do that there?

cboettig added a commit that referenced this issue Jan 23, 2024
eitsupi added a commit that referenced this issue Jan 24, 2024
@eitsupi
Copy link
Member

eitsupi commented Jan 24, 2024

@cboettig Did you see the comment #736 (comment)?
We should use apt-get remove --purge instead of apt-get -y purge?

@cboettig
Copy link
Member

@eitsupi I think you mean the reverse, that we should use apt-get -y purge ? that's interesting, I didn't test. I'm trying to find some documentation that apt-get purge and apt-get remove --purge should function differently? Maybe @eddelbuettel knows?

I always thought that remove meant: remove the binaries, but leave configuration files, data files, and dependencies, and that purge meant remove binaries + all that stuff.

@benz0li any chance you meant that you use apt-get remove without purge? I could understand how that would avoid the issue, but that would, to my understanding, leave all dependencies of our BUILDDEPS installed, but our BUILDDEPS list pulls in quite a number of additional dependencies in the process, and I think in general we do want to clean all those up.

@eddelbuettel
Copy link
Member

eddelbuettel commented Jan 24, 2024

Maybe @eddelbuettel knows?

When I read over the come apt purge vs apt remove --purge moments ago and mostly just smiled, shaking my head because after thirty years with Debian I still do not know the difference between apt upgrade and apt dist-upgrade.

For purge vs removal my mental model is that the latter removes the package files but leaves configuration and the former also nukes ("purges") the configuration files for a package. The difference may not matter much on containers as opposed to machines with actual reinstallations of packages. But YMMV and grains of salt and everything...

@benz0li
Copy link
Contributor

benz0li commented Jan 24, 2024

@benz0li any chance you meant that you use apt-get remove without purge?

No. See https://github.com/b-data/jupyterlab-r-docker-stack/blob/e41ce09e241a060e5d4f9e558121b925007f52cc/base/latest.Dockerfile#L299-L301 for example.

I'm trying to find some documentation that apt-get purge and apt-get remove --purge should function differently?

@cboettig According to the manual page: remove --purge is equivalent to the purge command.


In my images linux packages do not get removed, because R is built in a separate image and then copied [from /usr/local] to [/usr/local of] an image that has only the runtime dependencies installed.

@javdg
Copy link
Author

javdg commented Jan 24, 2024

Thanks for looking into this everyone!

I did some testing regarding the apt-get remove --purge vs. apt-get purge point raised above:

I took nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 and ran the (supposedly) relevant commands (with slight modifications where needed) as per scripts/install_R_source.sh:

apt-get update
(l. 18)

apt-get install -y --no-install-recommends bash-completion ca-certificates file fonts-texgyre g++ gfortran gsfonts libblas-dev libbz2-* libcurl4 "libicu[0-9][0-9]" liblapack-dev libpcre2* libjpeg-turbo* libpangocairo-* libpng16* libreadline8 libtiff* liblzma* make tzdata unzip zip zlib1g

(cf. l. 35)

export BUILDDEPS="curl \
    default-jdk \
    devscripts \
    libbz2-dev \
    libcairo2-dev \
    libcurl4-openssl-dev \
    libpango1.0-dev \
    libjpeg-dev \
    libicu-dev \
    libpcre2-dev \
    libpng-dev \
    libreadline-dev \
    libtiff5-dev \
    liblzma-dev \
    libx11-dev \
    libxt-dev \
    perl \
    rsync \
    subversion \
    tcl-dev \
    tk-dev \
    texinfo \
    texlive-extra-utils \
    texlive-fonts-recommended \
    texlive-fonts-extra \
    texlive-latex-recommended \
    texlive-latex-extra \
    x11proto-core-dev \
    xauth \
    xfonts-base \
    xvfb \
    wget \
    zlib1g-dev"

(cf. l. 61)

apt-get install -y --no-install-recommends ${BUILDDEPS}
(l. 96)

In different containers, I then ran apt-get remove --purge ${BUILDDEPS} and apt-get purge ${BUILDDEPS}

This lead to identical results with the following list of packages to be removed/marked for removal (note that this does, amongst others, include build-essentialand various cuda-*-packages not originally specified in BUILDDEPS):

The following packages were automatically installed and are no longer required:
  bzip2 ca-certificates-java cuda-cuxxfilt-11-8 cuda-nvprune-11-8 default-jdk-headless default-jre
  default-jre-headless fakeroot fonts-lmodern gir1.2-freedesktop gir1.2-glib-2.0 gir1.2-harfbuzz-0.0
  gir1.2-pango-1.0 icu-devtools java-common libapache-pom-java libapr1 libaprutil1 libasound2
  libasound2-data libavahi-client3 libavahi-common-data libavahi-common3 libblkid-dev libbrotli-dev
  libcairo-gobject2 libcairo-script-interpreter2 libcommons-logging-java libcommons-parent-java
  libcups2 libdbus-1-3 libdeflate-dev libexpat1-dev libfakeroot libffi-dev libfindlib-ocaml
  libfontbox-java libfontenc1 libfribidi-dev libgdbm-compat4 libgdbm6 libgif7 libgirepository-1.0-1
  libglib2.0-bin libglib2.0-dev-bin libgraphite2-dev libharfbuzz-gobject0 libharfbuzz-icu0 libice-dev
  libice6 libjbig-dev libjpeg8-dev libjs-jquery libkpathsea6 liblcms2-2 liblzo2-2 libmpdec3
  libncurses-dev libncurses5-dev libnspr4 libnss3 libpangoxft-1.0-0 libpaper-utils libpaper1
  libpcre16-3 libpcre3-dev libpcre32-3 libpcrecpp0v5 libpcsclite1 libpdfbox-java libperl5.34
  libpixman-1-dev libpopt0 libptexenc1 libpthread-stubs0-dev libpython3-stdlib libpython3.10-minimal
  libpython3.10-stdlib libsepol-dev libserf-1-1 libsm-dev libsm6 libsombok3 libsvn1 libsynctex2
  libtcl8.6 libteckit0 libtexlua53 libtexluajit2 libtk8.6 libunwind8 libutf8proc2 libxau-dev libxaw7
  libxcb-render0-dev libxcb-shm0-dev libxcb1-dev libxdmcp-dev libxfont2 libxft2 libxkbfile1 libxmu6
  libxmuu1 libxpm4 libxss1 libxt6 libxtst6 libzzip-0-13 lmodern lto-disabled-list media-types netbase
  ocaml ocaml-compiler-libs ocaml-findlib ocaml-interp openjdk-11-jdk openjdk-11-jdk-headless
  openjdk-11-jre openjdk-11-jre-headless pango1.0-tools patch perl-modules-5.34 perl-openssl-defaults
  preview-latex-style python3 python3-distutils python3-lib2to3 python3-minimal python3.10
  python3.10-minimal t1utils tcl tcl8.6 tex-common tk tk8.6 uuid-dev wdiff x11-common x11-xkb-utils
  x11proto-dev xdg-utils xfonts-encodings xfonts-utils xkb-data xorg-sgml-doctools xserver-common
  xtrans-dev xz-utils
Use 'apt autoremove' to remove them.
The following packages will be REMOVED:
  build-essential* cuda-compiler-11-8* cuda-minimal-build-11-8* cuda-nvcc-11-8* curl* default-jdk*
  devscripts* dpkg-dev* libb-hooks-op-check-perl* libbz2-dev* libbz2-ocaml-dev* libcairo2-dev*
  libclass-method-modifiers-perl* libclass-xsaccessor-perl* libcurl4-openssl-dev* libdatrie-dev*
  libdevel-callchecker-perl* libdpkg-perl* libdynaloader-functions-perl* libencode-locale-perl*
  libfile-dirlist-perl* libfile-homedir-perl* libfile-listing-perl* libfile-touch-perl*
  libfile-which-perl* libfontconfig-dev* libfontconfig1-dev* libfreetype-dev* libfreetype6-dev*
  libglib2.0-dev* libharfbuzz-dev* libhtml-parser-perl* libhtml-tagset-perl* libhtml-tree-perl*
  libhttp-cookies-perl* libhttp-date-perl* libhttp-message-perl* libhttp-negotiate-perl* libicu-dev*
  libimport-into-perl* libio-html-perl* libio-pty-perl* libio-socket-ssl-perl* libipc-run-perl*
  libjpeg-dev* liblwp-mediatypes-perl* liblwp-protocol-https-perl* liblzma-dev* libmime-charset-perl*
  libmodule-runtime-perl* libmoo-perl* libmount-dev* libnet-http-perl* libnet-ssleay-perl*
  libpango1.0-dev* libparams-classify-perl* libpcre2-dev* libpng-dev* libreadline-dev*
  librole-tiny-perl* libselinux1-dev* libsub-quote-perl* libtext-unidecode-perl* libthai-dev*
  libtiff-dev* libtiff5-dev* libtimedate-perl* libtry-tiny-perl* libunicode-linebreak-perl*
  liburi-perl* libwww-perl* libwww-robotrules-perl* libx11-dev* libxext-dev* libxft-dev*
  libxml-libxml-perl* libxml-namespacesupport-perl* libxml-sax-base-perl* libxml-sax-perl*
  libxrender-dev* libxss-dev* libxt-dev* patchutils* perl* pkg-config* rsync* subversion* tcl-dev*
  tcl8.6-dev* texinfo* texlive-base* texlive-binaries* texlive-extra-utils* texlive-fonts-extra*
  texlive-fonts-recommended* texlive-latex-base* texlive-latex-extra* texlive-latex-recommended*
  texlive-luatex* texlive-pictures* texlive-plain-generic* tk-dev* tk8.6-dev* wget* x11proto-core-dev*
  xauth* xfonts-base* xvfb* zlib1g-dev*

According to https://www.mankier.com/8/apt-get#--purge, remove [--purge] is equivalent to the purge command. so I am not surprised to see no difference there.

Additionally, apt-get remove ${BUILDDEPS} leads to the same list of packages, but without the various trailing *, which (again https://www.mankier.com/8/apt-get#--purge) will be displayed next to packages which are scheduled to be purged. Furthermore https://www.mankier.com/8/apt-get#Description-purge does in this respect confirm @cboettig's understanding of remove vs. purge.

Judging from this I would say this is not about subtle differences in command syntax (they seem to be identical/working as expected/documented), but a rather curious case of Debian/Ubuntu dependency management, where a collection of packages to be installed will pull in additional dependencies and/or create "reverse dependencies" which, once uninstalling the original set of packages, do proceed to rip out other parts of the system...

@cboettig
Copy link
Member

Thanks all, details super appreciated. Working on fix for this in recent PRs. A multi-stage build is probably the natural thing but a non-trivial shift, for the moment I think we'll simply leave the builddeps in place on the cuda stack (that nvidia base image is so large to begin with anyway)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants