Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile: remove /build/core/.git directory to overwrite with submodule link #442

Merged
merged 6 commits into from
Jul 1, 2024

Conversation

kba
Copy link
Member

@kba kba commented Jun 26, 2024

Otherwise build fails with

#11 ERROR: cannot replace to directory /.../build/core/.git with file

I am not sure how we introduced this and this workaround is not pretty (rm -rf /build/core/.git) so if somebody has a better solution, I'd be happy to change it.

Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you are trying to fix CI of #441.

This used to work before. Something else is going on here. I don't think we should resort to workarounds just yet.

I'll try to debug via SSH...

Dockerfile Outdated
@@ -89,6 +89,7 @@ RUN rm $VIRTUAL_ENV/bin/pip* && apt-get purge -y python3-pip && python3 -m venv
# copy (sub)module directories to build
# (ADD/COPY cannot copy a list of directories,
# so we must rely on .dockerignore here)
RUN rm -rf /build/core/.git
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can that help at all? The submodules are copied in the follow-up line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the COPY . . in the following line fails because the /build/core/.git that comes with the base image is a directory but since core is a git submodule in ocrd_all, /build/core/.git is a file pointing to the parent git dir (gitdir: ../.git/modules/core). And COPY (as well as cp and mv) will abort when trying to overwrite a directory with a non-directory.

I get the same behavior locally, so it is not just in the CD.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right! I forgot about that (base stage already yields an independent clone).

So this also begs the question (again) what is supposed to happen if the submodule version of core deviates from the base stage version of core. For now we settled with the approach of determining core's version from the submodule and passing that as tag for the base stage in a Docker build. Perhaps we should put an assertion into the Dockerfile like so:

# from-stage already contains a clone clashing with build context
RUN rm -rf /build/core/.git
COPY . .
# verify we got the right version of core
RUN COREV=`git -C /build/core describe --tags --abbrev=0` && [[ "$BASE_IMAGE" =~ :$COREV ]]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At any rate, I suggested adding a dedicated comment for the RUN rm step and moving it a few lines up because the existing comment still applies to COPY only.

Copy link
Member Author

@kba kba Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this also begs the question (again) what is supposed to happen if the submodule version of core deviates from the base stage version of core.

Considering that we derive CORE_VERSION from git describe and derive BASE_IMAGE from CORE_VERSION in the Makefile, I don't see where this might actually happen. If there is no versioned Docker image matching the version in the core submodule (ie. the CD of core failed and we did not notice), then the build will fail at the FROM $BASE_IMAGE line. The only scenario I can think of is that the core submodule contains unreleased commits, in which case there could be a mismatch, e.g. it is not at version vX but at vX-commitsha, ie. we should not use the --abbrev=0 option, so only a core submodule at the exact commit of a tag should be acceptable.

Hence, I've adapted your proposed check to fail the build in this case with a message like

ocrd/core:v2.66.1 inconsistent with core version v2.66.1-1-gd960be4e

Considering this is now a requisite for #441, I'll merge once CI is happy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

@bertsky bertsky self-requested a review June 27, 2024 10:43
@kba
Copy link
Member Author

kba commented Jun 28, 2024

CI failed as expected with

ocrd/core-cuda:v2.65.0 inconsistent with core version v2.65.0-1-gcc6ea575

which is good, but I update core in this PR as well to make sure it won't if all is set up properly.

@kba
Copy link
Member Author

kba commented Jun 28, 2024

CI failed as expected with

ocrd/core-cuda:v2.65.0 inconsistent with core version v2.65.0-1-gcc6ea575

which is good, but I update core in this PR as well to make sure it won't if all is set up properly.

Ah, but of course there is a catch. The check won't work if we derive from a non-versioned Docker base image such as ocrd/core:minimum-cuda.

I still think the check is a good idea and will open an issue for it but comment it out for now because we really need a new release urgently.

@kba
Copy link
Member Author

kba commented Jun 28, 2024

It looks like make docker-maximum-cuda does not just build that image:

> make -n docker-maximum-cuda NO_UPDATE=1
docker build \
--progress=plain \
--build-arg BASE_IMAGE=ocrd/core-cuda:v2.66.1 \
--build-arg VCS_REF=$(git rev-parse --short HEAD) \
--build-arg BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") \
--build-arg OCRD_MODULES="ocrd_cis ocrd_fileformat ocrd_im6convert ocrd_pagetopdf ocrd_repair_inconsistencies ocrd_tesserocr ocrd_wrap workflow-configuration ocrd_olahd_client" \
--build-arg PIP_OPTIONS="-e" \
--build-arg PARALLEL="" \
--build-arg PYTHON="python3" \
--network=host \
-t ocrd/all:minimum-cuda .
docker build \
--progress=plain \
--build-arg BASE_IMAGE=ocrd/all:minimum-cuda \
--build-arg VCS_REF=$(git rev-parse --short HEAD) \
--build-arg BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") \
--build-arg OCRD_MODULES="cor-asv-ann dinglehopper docstruct format-converters nmalign ocrd_calamari ocrd_cis ocrd_fileformat ocrd_im6convert ocrd_keraslm ocrd_olahd_client ocrd_olena ocrd_pagetopdf ocrd_repair_in
consistencies ocrd_segment ocrd_tesserocr ocrd_wrap workflow-configuration" \
--build-arg PIP_OPTIONS="-e" \
--build-arg PARALLEL="" \
--build-arg PYTHON="python3" \
--network=host \
-t ocrd/all:medium-cuda .
docker build \
--progress=plain \
--build-arg BASE_IMAGE=ocrd/all:medium-cuda \
--build-arg VCS_REF=$(git rev-parse --short HEAD) \
--build-arg BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") \
--build-arg OCRD_MODULES="cor-asv-ann core dinglehopper docstruct eynollah format-converters nmalign ocrd_anybaseocr ocrd_calamari ocrd_cis ocrd_detectron2 ocrd_doxa ocrd_fileformat ocrd_froc ocrd_im6convert ocrd_
keraslm ocrd_kraken ocrd_olahd_client ocrd_olena ocrd_pagetopdf ocrd_repair_inconsistencies ocrd_segment ocrd_tesserocr ocrd_wrap sbb_binarization workflow-configuration" \
--build-arg PIP_OPTIONS="-e" \
--build-arg PARALLEL="" \
--build-arg PYTHON="python3" \
--network=host \
-t ocrd/all:maximum-cuda .

Which leads to exceeding the 1h time limit...

@kba
Copy link
Member Author

kba commented Jul 1, 2024

``

It looks like make docker-maximum-cuda does not just build that image:

> make -n docker-maximum-cuda NO_UPDATE=1
docker build \
--progress=plain \
--build-arg BASE_IMAGE=ocrd/core-cuda:v2.66.1 \
--build-arg VCS_REF=$(git rev-parse --short HEAD) \
--build-arg BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") \
--build-arg OCRD_MODULES="ocrd_cis ocrd_fileformat ocrd_im6convert ocrd_pagetopdf ocrd_repair_inconsistencies ocrd_tesserocr ocrd_wrap workflow-configuration ocrd_olahd_client" \
--build-arg PIP_OPTIONS="-e" \
--build-arg PARALLEL="" \
--build-arg PYTHON="python3" \
--network=host \
-t ocrd/all:minimum-cuda .
docker build \
--progress=plain \
--build-arg BASE_IMAGE=ocrd/all:minimum-cuda \
--build-arg VCS_REF=$(git rev-parse --short HEAD) \
--build-arg BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") \
--build-arg OCRD_MODULES="cor-asv-ann dinglehopper docstruct format-converters nmalign ocrd_calamari ocrd_cis ocrd_fileformat ocrd_im6convert ocrd_keraslm ocrd_olahd_client ocrd_olena ocrd_pagetopdf ocrd_repair_in
consistencies ocrd_segment ocrd_tesserocr ocrd_wrap workflow-configuration" \
--build-arg PIP_OPTIONS="-e" \
--build-arg PARALLEL="" \
--build-arg PYTHON="python3" \
--network=host \
-t ocrd/all:medium-cuda .
docker build \
--progress=plain \
--build-arg BASE_IMAGE=ocrd/all:medium-cuda \
--build-arg VCS_REF=$(git rev-parse --short HEAD) \
--build-arg BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") \
--build-arg OCRD_MODULES="cor-asv-ann core dinglehopper docstruct eynollah format-converters nmalign ocrd_anybaseocr ocrd_calamari ocrd_cis ocrd_detectron2 ocrd_doxa ocrd_fileformat ocrd_froc ocrd_im6convert ocrd_
keraslm ocrd_kraken ocrd_olahd_client ocrd_olena ocrd_pagetopdf ocrd_repair_inconsistencies ocrd_segment ocrd_tesserocr ocrd_wrap sbb_binarization workflow-configuration" \
--build-arg PIP_OPTIONS="-e" \
--build-arg PARALLEL="" \
--build-arg PYTHON="python3" \
--network=host \
-t ocrd/all:maximum-cuda .

Which leads to exceeding the 1h time limit...

That is the correct behavior as of #436. Upping the resource_class from the implied medium to large fixes the build time. Merged into #441.

@bertsky
Copy link
Collaborator

bertsky commented Jul 1, 2024

Which leads to exceeding the 1h time limit...

That's odd. We used to be well below 1h with the new multi-stage build (core → minimum → medium → maximum), which we switched to recently in #436. In fact, it was just 47min. It now times out during layer export – perhaps in this case, merely the network side happened to be slow?

Or is resource_class: large somehow having an opposite effect here?

@bertsky
Copy link
Collaborator

bertsky commented Jul 1, 2024

Upping the resource_class from the implied medium to large fixes the build time. Merged into #441.

Ah, I did not notice this was the most recent change. Thanks!

@kba kba merged commit a4968a7 into master Jul 1, 2024
1 check passed
@stweil stweil deleted the core-git-dir branch July 17, 2024 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants