-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auroraboot out of memory for some images #3037
Comments
afaict passed to qemu |
I see. Thanks for pointing out. That means that there is no way to prevent loading the full docker image into memory? If so, this can be solved "as easy" as giving more RAM to my build agents. |
We still want to have a look and see if it's possible to avoid loading it into memory. Let's keep this open until we check. In the meantime, giving more RAM is the workaround I guess. |
This is kind of weird, my tests locally with a 3Gb image, I cant see anything going over 60Mb of ram used. IIRC the puller is the go-containerregistry and that should stream the image according to the source. So it should not consume too much memory I think. Im trying to get some stats to check this |
Might be related to the size of individual layers? I have a couple of layers that are close to 1.5 gb each |
yes, Im trying to test with a big ass layer, because that might indeed be the issue :D Using a 3Gb image at pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime gives me the following max mem when calling the method that dumps the image to a dir:
So thats 191940K which is around 190Mb. Let me try with a smashed image in which the layers are big enough |
going to try with localai/localai:master-cublas-cuda11-ffmpeg-core which has a nice 1.5Gb layer in there |
with that big image I get similar results:
|
no I lie, With a bigger layers image I do get a lot of used memory, in fact we can reproduce this by calling the application under systemd-run to limit the max memory and it gets killed. Seems to not happen with smaller images and smaller layers, so either you can try to give more memory to the runners or try to make the layers smaller by copying stuff in different entries? I tried a few methods in order to try to minimize it but had no luck, sorry :( |
I had some luck with a different approach to downloading the image, which was to download the layers first to disk then stream those layers and extract them. That allowed for a bigger image but not as big as the memory. So an image with a compressed 4,5Gb layer would die at about 85%. Ill check this further tomorrow see if its a viable alternative |
just leaving here that this is what I got:
This implements a chunked approach and first downloads the layers to disk and then extracts them. With a compressed layer of 4750Mb and with a restricted max 4Gb of ram, this gets about to 85% of the layer extracted. So it may improve things for smaller layers than ram available. Not a real fix, but it may improve things |
Sounds nice. The fact is that I am creating an appliance by creating a customized airgapped k3s image, where basically I download some docker images using skopeo and adding them to tgz in a specific folder. Somehing similar to this in my dockerfile (some parts have already been ommited):
The layer that is adding /data/airgap/images is really big (between 1 and 2 gb if I can remember correctly) because is copying many images using skopeo and a yaml file like this:
The workaround i will use is to, even that will be an "ugly" dockerfile, download and copy all the images in independent layers (by adding multiple RUN and COPY lines , one per image). At least that will reduce the individual layer size. I'll update on that |
Labeled it as "enhancement". We have the code that improves it, let's see when we can plan it. |
Discussed in #2986
Originally posted by davidnajar November 7, 2024
Hello Community!
I'm having issues when building some large images using Auroraboot. I'm customizing the Rocky OS image by creating a new container adding extra layers on top on it, using helm and skopeo to download oci images to the local filesystem with the intention to later on, after installing, and during first booting, using a systemd service, reading this images with skopeo again and push it to the local containerd daemon. This works good in airgap enviroments, to prevent the need of internet connection to set up my edge cluster's default services.
However, I've been adding extra docker images now, and I've arrived to a point where I'm getting always out of memory errors when trying to build the iso using auroraboot. The docker image itself uncompressed is about 5.4 gb according to docker desktop. The last available iso file i was able to build has been around 2.4 gb . But now, after adding a couple more images, I'm not able to build anymore.
While no documented, when reading the source code of auroraboot (and not a go expert) I got to the conclusion that i can limit the size of the memory pressure by setting a value in system.memory option.
I run this build in a CI agent:
When doing this, what i see at the end is a failure showing the following:
I understand that the error is that might be the image is too big to be loaded in memory but I can't confirm. However, I was expecting that by setting system.memory to some value to be able to limit that.
Is there some setting I could be missing?
The text was updated successfully, but these errors were encountered: