Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the stack file structure #755

Open
eitsupi opened this issue Jan 24, 2024 · 8 comments
Open

Change the stack file structure #755

eitsupi opened this issue Jan 24, 2024 · 8 comments
Labels
CI enhancement New feature or request pre-built images Related to pre-built images

Comments

@eitsupi
Copy link
Member

eitsupi commented Jan 24, 2024

Related to #736, #754

I think we are reaching our limits with the current structure of the stack file.

  • Automatic updates were not taken into consideration and forced to be expanded and used.
  • Cannot generate complex Dockerfile like multi-stage builds

So what about changing completely to files that assumes the use of a template engine?
Something like the following:

group:
  default:
    targets:
      - r-ver
      - rstudio
      - ...
    cuda11images:
      - cuda
      - ...
images:
  - id: 1
    name: r-ver
    tags:
      - docker.io/rocker/r-ver:4.3.2
      - ...
    platforms:
      - linux/amd64
      - linux/arm64
    cmd:
      - R
    dockerfile-template: |
      FROM ubuntu:jammy

      ENV R_VERSION={{r_version}}
      ENV R_HOME=/usr/local/lib/R
      ENV TZ=Etc/UTC

      COPY scripts/install_R_source.sh /rocker_scripts/install_R_source.sh
      RUN /rocker_scripts/install_R_source.sh

      ENV CRAN={{cran_url}}
      ENV LANG=en_US.UTF-8

      COPY scripts/setup_R.sh /rocker_scripts/setup_R.sh
      RUN /rocker_scripts/setup_R.sh

  - id: 2
    name: rstudio
    parent: 1
    tags:
      - ...
    cache-from:
      - docker.io/rocker/r-ver:4.3.2
      - ...
    parts:
      - id: 1
        type: env
        name: RSTUDIO_VERSION
        value: "{{rstudio_version}}"
      - id: 2
        type: script
        name: install_rstudio.sh
      - id: 3
        type: script
        name: install_pandoc.sh
      - id: 4
        type: script
        name: install_quarto.sh

Complex Dockerfiles could be represented as multi-line text, and simple parts could be represented as objects ordered by id.
(If a type: script is specified, COPY and RUN clauses are automatically generated for the Dockerfile.)

At the moment, I don't think there is any shared use between Dockerfiles and bake files, so it might be better to separate the hierarchy for each.

@cboettig Thoughts?

@eitsupi eitsupi added enhancement New feature or request CI pre-built images Related to pre-built images labels Jan 24, 2024
@cboettig
Copy link
Member

Agree that we're hitting the limits of our current build system. A better design would be compelling. One of the many limitations with our current stack.json is that it's an ad-hoc method not documented or used by any other projects. That makes it harder for other potential contributors to use or contribute. It feels like this should be reasonably well-established territory and an ad-hoc solution should not be necessary, but I haven't managed to stay up-to-date on this topic.

Is the yaml structure above part of the modern buildx / bake system or just meant as an illustration of a more logical but still ad-hoc format? Maybe we can do a quick survey of possible options? Or has this area of 'devops for the development of devops' just remained a wild-west of ad-hoc solutions?

@eitsupi
Copy link
Member Author

eitsupi commented Jan 25, 2024

This is completely ad hoc. I have rarely seen even a bake file (probably hcl is recommended over json) used in the first place.

Inserting parameters into the Dockerfile can generally be done using args, or we can use a template engine such as jinja2.

If we are moving to a simple configuration without ad-hoc stuff, bake files (much the same as the current ones) + templated Dockerfiles (perhaps using glue if updating by R?) would make sense?

FROM ubuntu:{{ubuntu_version}}

COPY ...
RUN ...
...

@cboettig
Copy link
Member

I'm all for a redesign of the current ad-hoc system with something that is more efficient in avoiding unnecessary rebuilds and easier for others to follow. Using a template framework for the Dockerfiles sounds good to me, I'm happy with jinja2 or whatever option your most familiar with if you're up for doing the heavy lift here!

@eitsupi
Copy link
Member Author

eitsupi commented Feb 1, 2024

@cboettig I created a minimal example.
Could you take a look at this?
https://github.com/eitsupi/rocker-versioned-next

(I wasn't sure whether to keep the repository personally or in this organization, but I decided to keep it as a personal repository for now. I can transfer it later.)

@cboettig
Copy link
Member

cboettig commented Feb 1, 2024

@eitsupi This looks really cool.

One thing I'd really like to see in the new build architecture is leveraging multi-stage build patterns for installations from source. It would be great to see that in the template design from the start. It may require us to rethink some things; e.g. maybe doing all these installs in /opt/R rather than in /usr/local/R and adjusting paths and ld libs accordingly, so that we have a single path to copy over from.

@eitsupi
Copy link
Member Author

eitsupi commented Feb 2, 2024

@cboettig Added a sample of something like rocker/cuda. Does this make sense?
https://github.com/eitsupi/rocker-versioned-next/blob/84d98e43f869fdba0f1a75cd58ddeb8ce028d7a2/dockerfile-templates/cuda.Dockerfile.txt

I do not know which directory to copy. (I do not understand which directories the installed R depends on).
But generating a Dockerfile that includes a multi-stage build is no problem at all.

@eitsupi
Copy link
Member Author

eitsupi commented Feb 2, 2024

The structure I now consider to be prevailing is as follows:

  • Store variables in tsv files (Replaces current stack files, auto-updated by CI via R script)
  • The structure of the Dockerfiles other than variables are defined in Dockerfiles written so that the template engine can complete the variables. (Update manually)
  • The bakefiles also need to be generated based on templates, but since we use json as bakefile format, instead of using a template engine, it simply parses the template as json and writes it to separate json files.

This mechanism is fairly simple except for the process of calculating variables (now done in https://github.com/rocker-org/rocker-versioned2/blob/26c50e561ae4b10386b9f7adaa37a77b52f7f5d6/build/make-stacks.R).

The drawback is that the Dockerfile must be written entirely by hand, and we have to allow for considerable duplication in the r-ver, rstudio, and tidyverse, for example, but it is acceptable given that the number is not that large.

@cboettig
Copy link
Member

cboettig commented Feb 3, 2024

I do not know which directory to copy. (I do not understand which directories the installed R depends on).

Right, this is where multistage builds get tricky. Apologies if this is all familiar already: In general, we are not going to be able to use the install from source recipes we have in unchanged form in multistage build. As you know, in a standard linux install, the application does not end up in any single directory. Binaries usually go (or are symlinked to) in /usr/local/bin, libs in /usr/local/lib and /usr/local/include, sometimes /usr/local/share and maybe elsewhere, like configs in /etc. Obviously one can't just copy from the whole of /usr/local/bin from the builder because that can bring in unwanted stuff from the build image. Instead, we need first to edit the install script, usually the Makefile takes some argument like a "BUILD_DIR" or "PREFIX" (I don't recall off hand how this is set up for R, but I bet @eddelbuettel knows off the top of his head), so that you can do something like:

COPY --from=builder  /build/usr/include/ /usr/include/

The multistage build setup for GDAL is a good example of this: https://github.com/OSGeo/gdal/blob/master/docker/ubuntu-full/Dockerfile , but really these are just conventions and each source build can be a bit different. I think compared to gdal, R is mostly pretty simple, but at very least in addition to copying R_HOME we must either symlink the binaries or update the PATH.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI enhancement New feature or request pre-built images Related to pre-built images
Projects
None yet
Development

No branches or pull requests

2 participants