Auroraboot out of memory for some images #3037

jimmykarily · 2024-11-25T09:59:12Z

Discussed in #2986

^{Originally posted by davidnajar November 7, 2024}
Hello Community!
I'm having issues when building some large images using Auroraboot. I'm customizing the Rocky OS image by creating a new container adding extra layers on top on it, using helm and skopeo to download oci images to the local filesystem with the intention to later on, after installing, and during first booting, using a systemd service, reading this images with skopeo again and push it to the local containerd daemon. This works good in airgap enviroments, to prevent the need of internet connection to set up my edge cluster's default services.

However, I've been adding extra docker images now, and I've arrived to a point where I'm getting always out of memory errors when trying to build the iso using auroraboot. The docker image itself uncompressed is about 5.4 gb according to docker desktop. The last available iso file i was able to build has been around 2.4 gb . But now, after adding a couple more images, I'm not able to build anymore.

While no documented, when reading the source code of auroraboot (and not a go expert) I got to the conclusion that i can limit the size of the memory pressure by setting a value in system.memory option.

I run this build in a CI agent:

docker run -i  --rm --net=host --cap-add=NET_ADMIN  \
          --mount 'type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock' \
          --mount 'type=bind,source={{.HOST_VOLUME}}/build/container,target=/build' \
          --workdir /build \
            quay.io/kairos/auroraboot:\
            --debug=true \
            --set "disable_http_server=true" \
            --set "disable_netboot=true" \
            --set "state_dir=/build" \
            --set "disk.vhd=false" \
            --set "system.memory=2500" \   # settings I've been playing with
            --set "system.cores=2" \            # settings I've been playing with
            --set "container_image=docker://previous-custom-rockylinux-docker-image:latest" \
            --cloud-config - < src/cloud-init/cloud-config.yaml

When doing this, what i see at the end is a failure showing the following:

Pulling container image 'previous-custom-rockylinux-docker-image:latest' to '/build/temp-rootfs')
fatal error: runtime: out of memory

I understand that the error is that might be the image is too big to be loaded in memory but I can't confirm. However, I was expecting that by setting system.memory to some value to be able to limit that.
Is there some setting I could be missing?

The text was updated successfully, but these errors were encountered:

jimmykarily · 2024-11-26T06:58:09Z

afaict system.memory is only used here: https://github.com/kairos-io/AuroraBoot/blob/4789fa0b20162b8fb9b43c67281d6ce5df634dc8/pkg/ops/disks.go#L159

passed to qemu

davidnajar · 2024-11-26T10:56:24Z

I see. Thanks for pointing out. That means that there is no way to prevent loading the full docker image into memory? If so, this can be solved "as easy" as giving more RAM to my build agents.

jimmykarily · 2024-11-26T12:05:12Z

We still want to have a look and see if it's possible to avoid loading it into memory. Let's keep this open until we check. In the meantime, giving more RAM is the workaround I guess.

Itxaka · 2024-11-26T12:42:54Z

This is kind of weird, my tests locally with a 3Gb image, I cant see anything going over 60Mb of ram used. IIRC the puller is the go-containerregistry and that should stream the image according to the source. So it should not consume too much memory I think. Im trying to get some stats to check this

davidnajar · 2024-11-26T13:06:54Z

Might be related to the size of individual layers? I have a couple of layers that are close to 1.5 gb each

Itxaka · 2024-11-26T13:11:42Z

yes, Im trying to test with a big ass layer, because that might indeed be the issue :D

Using a 3Gb image at pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime gives me the following max mem when calling the method that dumps the image to a dir:

37.60user 4.96system 0:53.00elapsed 80%CPU (0avgtext+0avgdata 191940maxresident)k
0inputs+101520outputs (0major+40534minor)pagefaults 0swaps

So thats 191940K which is around 190Mb.

Let me try with a smashed image in which the layers are big enough

Itxaka · 2024-11-26T13:14:41Z

going to try with localai/localai:master-cublas-cuda11-ffmpeg-core which has a nice 1.5Gb layer in there

Itxaka · 2024-11-26T13:16:39Z

with that big image I get similar results:

50.12user 7.10system 1:35.02elapsed 60%CPU (0avgtext+0avgdata 194288maxresident)k
0inputs+120outputs (1major+39627minor)pagefaults 0swaps

Itxaka · 2024-11-26T14:49:29Z

no I lie, With a bigger layers image I do get a lot of used memory, in fact we can reproduce this by calling the application under systemd-run to limit the max memory and it gets killed. systemd-run --scope -p MemoryMax=4G

Seems to not happen with smaller images and smaller layers, so either you can try to give more memory to the runners or try to make the layers smaller by copying stuff in different entries?

I tried a few methods in order to try to minimize it but had no luck, sorry :(

Itxaka · 2024-11-26T19:07:29Z

I had some luck with a different approach to downloading the image, which was to download the layers first to disk then stream those layers and extract them. That allowed for a bigger image but not as big as the memory. So an image with a compressed 4,5Gb layer would die at about 85%.

Ill check this further tomorrow see if its a viable alternative

Itxaka · 2024-11-27T10:11:10Z

just leaving here that this is what I got:

package main

import (
	"archive/tar"
	"compress/gzip"
	"fmt"
	"github.com/kairos-io/kairos-sdk/utils"
	"io"
	"log"
	"os"
	"path/filepath"
	"runtime"
	"runtime/debug"
	"sync"
	"time"

	v1 "github.com/google/go-containerregistry/pkg/v1"
)

type ProgressTracker struct {
	mu              sync.Mutex
	totalLayers     int
	completedLayers int
	totalSize       int64
	downloadedSize  int64
}

func (pt *ProgressTracker) StartLayer(layerIndex int, layerSize int64) {
	pt.mu.Lock()
	defer pt.mu.Unlock()
	pt.totalLayers = layerIndex + 1
	fmt.Printf("Starting layer %d (size: %.2f MB)\n", layerIndex+1, float64(layerSize)/(1024*1024))
}

func (pt *ProgressTracker) UpdateDownloadProgress(layerIndex int, downloaded int64, layerSize int64) {
	pt.mu.Lock()
	defer pt.mu.Unlock()
	pt.downloadedSize = downloaded
	percentage := float64(downloaded) / float64(layerSize) * 100
	fmt.Printf("\rLayer %d: Download Progress: %.2f%%", layerIndex+1, percentage)
}

func (pt *ProgressTracker) CompleteLayer(layerIndex int) {
	pt.mu.Lock()
	defer pt.mu.Unlock()
	pt.completedLayers++
	fmt.Printf("\nLayer %d: Download and extraction completed\n", layerIndex+1)
}

func (pt *ProgressTracker) FinalSummary() {
	fmt.Printf("\nImage download and extraction complete\n")
	fmt.Printf("Total layers processed: %d\n", pt.totalLayers)
}

var d = &ProgressTracker{}

func DownloadAndExtract(image v1.Image, target string) error {
	// Get layers
	fmt.Println("Getting image layers")
	layers, err := image.Layers()
	if err != nil {
		return fmt.Errorf("failed to get image layers: %v", err)
	}

	fmt.Printf("Image contains %d layers\n", len(layers))

	// Create destination directory
	if err := os.MkdirAll(target, 0755); err != nil {
		return fmt.Errorf("failed to create destination directory: %v", err)
	}

	// Process layers sequentially to minimize memory usage
	for i, layer := range layers {
		// Get layer size for progress tracking
		layerSize, err := layer.Size()
		if err != nil {
			return fmt.Errorf("failed to get layer size: %v", err)
		}

		d.StartLayer(i, layerSize)

		// Create a temporary file for the layer
		layerFile, err := os.CreateTemp("", fmt.Sprintf("layer_%d_*", i))
		if err != nil {
			return fmt.Errorf("failed to create temp file for layer: %v", err)
		}
		defer os.Remove(layerFile.Name())

		// Stream layer content to disk
		fmt.Println("Downloading layer", i+1)
		if err := downloadLayerToDisk(layer, layerFile, i); err != nil {
			return fmt.Errorf("failed to download layer %d: %v", i, err)
		}

		// Extract layer with minimal memory usage
		fmt.Println("Extracting layer", i+1)
		if err := extractLayerToDisk(layerFile.Name(), target); err != nil {
			return fmt.Errorf("failed to extract layer %d: %v", i, err)
		}

		// Close and reset the file for next iteration
		layerFile.Close()

		d.CompleteLayer(i)
	}

	d.FinalSummary()
	return nil
}

func downloadLayerToDisk(layer v1.Layer, dest *os.File, layerIndex int) error {
	// Get layer's compressed content
	rc, err := layer.Compressed()
	if err != nil {
		return fmt.Errorf("failed to get compressed layer: %v", err)
	}
	defer rc.Close()

	// Get layer size for progress tracking
	layerSize, err := layer.Size()
	if err != nil {
		return fmt.Errorf("failed to get layer size: %v", err)
	}

	// Create a custom writer to track download progress
	progressWriter := &ProgressWriter{
		writer:          dest,
		totalSize:       layerSize,
		progressTracker: d,
		layerIndex:      layerIndex,
	}

	// Buffer to read in chunks
	const chunkSize = 1024 * 1024 // 1 MB

	buffer := make([]byte, chunkSize)
	for {
		// Read a chunk
		bytesRead, readErr := rc.Read(buffer)
		if readErr != nil && readErr != io.EOF {
			return fmt.Errorf("error reading file: %v", readErr)
		}

		// Write the chunk
		if bytesRead > 0 {
			fmt.Printf("Writing chunk %d\n", bytesRead)
			_, writeErr := progressWriter.Write(buffer[:bytesRead])
			if writeErr != nil {
				return fmt.Errorf("error writing file: %v", writeErr)
			}
		}

		// Break the loop if we've reached the end of the file
		if readErr == io.EOF {
			fmt.Println("Download complete")
			break
		}
		// Trigger garbage collection periodically
		runtime.GC()
		debug.FreeOSMemory()
	}

	// Reset file pointer for extraction
	if _, err := dest.Seek(0, 0); err != nil {
		return fmt.Errorf("failed to reset file pointer: %v", err)
	}

	return nil
}

type ProgressWriter struct {
	writer          io.Writer
	totalSize       int64
	currentProgress int64
	progressTracker *ProgressTracker
	layerIndex      int
}

func (pw *ProgressWriter) Write(p []byte) (int, error) {
	n, err := pw.writer.Write(p)
	if err != nil {
		return n, err
	}

	pw.currentProgress += int64(n)
	pw.progressTracker.UpdateDownloadProgress(pw.layerIndex, pw.currentProgress, pw.totalSize)

	return n, nil
}

func extractLayerToDisk(layerPath, destPath string) error {
	// Open the layer file
	layerFile, err := os.Open(layerPath)
	if err != nil {
		return fmt.Errorf("failed to open layer file: %v", err)
	}
	defer layerFile.Close()

	// Create gzip reader
	gzipReader, err := gzip.NewReader(layerFile)
	if err != nil {
		return fmt.Errorf("failed to create gzip reader: %v", err)
	}
	defer gzipReader.Close()

	// Create tar reader
	tarReader := tar.NewReader(gzipReader)

	// Extract files from tar
	for {
		header, err := tarReader.Next()
		if err == io.EOF {
			break // End of archive
		}
		if err != nil {
			return fmt.Errorf("error reading tar header: %v", err)
		}

		// Construct full path
		target := filepath.Join(destPath, header.Name)

		// Handle different file types
		switch header.Typeflag {
		case tar.TypeDir:
			// Create directory
			if err := os.MkdirAll(target, 0755); err != nil {
				return fmt.Errorf("failed to create directory %s: %v", target, err)
			}
		case tar.TypeReg, tar.TypeRegA:
			// Ensure directory exists
			if err := os.MkdirAll(filepath.Dir(target), 0755); err != nil {
				return fmt.Errorf("failed to create parent directory: %v", err)
			}

			// Create file
			outFile, err := os.OpenFile(target, os.O_RDWR|os.O_CREATE|os.O_TRUNC, os.FileMode(header.Mode))
			if err != nil {
				return fmt.Errorf("failed to create file %s: %v", target, err)
			}

			// Copy file contents in chunks
			buf := make([]byte, 32*1024) // 32 KB buffer
			for {
				n, err := tarReader.Read(buf)
				if err != nil && err != io.EOF {
					outFile.Close()
					return fmt.Errorf("failed to read tar file: %v", err)
				}
				if n == 0 {
					break
				}
				if _, err := outFile.Write(buf[:n]); err != nil {
					outFile.Close()
					return fmt.Errorf("failed to write file %s: %v", target, err)
				}
			}
			outFile.Close()

		case tar.TypeSymlink:
			// Create symbolic link
			if err := os.Symlink(header.Linkname, target); err != nil {
				return fmt.Errorf("failed to create symlink %s: %v", target, err)
			}
		}
	}

	return nil
}

func main() {
	startTime := time.Now()
	img, _ := utils.GetImage("ubuntu:24.04", "", nil, nil)

	if err := DownloadAndExtract(img, "/tmp/test"); err != nil {
		log.Fatalf("Image download and extraction failed: %v", err)
	}

	fmt.Printf("Total download and extraction time: %v\n", time.Since(startTime))
	os.RemoveAll("/tmp/test")
}

This implements a chunked approach and first downloads the layers to disk and then extracts them. With a compressed layer of 4750Mb and with a restricted max 4Gb of ram, this gets about to 85% of the layer extracted. So it may improve things for smaller layers than ram available. Not a real fix, but it may improve things

davidnajar · 2024-11-27T10:37:55Z

Sounds nice. The fact is that I am creating an appliance by creating a customized airgapped k3s image, where basically I download some docker images using skopeo and adding them to tgz in a specific folder. Somehing similar to this in my dockerfile (some parts have already been ommited):

# syntax=docker/dockerfile:1

ARG BASE_IMAGE

FROM quay.io/skopeo/stable:latest as skopeo-base

COPY --link src/full-airgap/build-container-files/full-airgap-manifest.yaml /full-airgap-manifest.yaml
RUN skopeo sync --src yaml --dest dir full-airgap-manifest.yaml /tmp/images

FROM alpine/helm as helm
WORKDIR /
COPY --link src/full-airgap/build-container-files/import-helm-charts.sh /import-helm-charts.sh
COPY --link src/full-airgap/build-container-files/import-helm-charts.csv /import-helm-charts.csv

RUN ls -la && chmod +x /import-helm-charts.sh \
    && /import-helm-charts.sh

FROM ${BASE_IMAGE}
COPY --link --from=skopeo-base /tmp/images/ /data/airgap/images/
COPY --link --from=helm /tmp/charts/ /data/airgap/charts/

COPY --link src/full-airgap/container-files/ /
RUN ln -s /etc/systemd/system/full-airgap.service /etc/systemd/system/multi-user.target.wants/full-airgap.service

The layer that is adding /data/airgap/images is really big (between 1 and 2 gb if I can remember correctly) because is copying many images using skopeo and a yaml file like this:

docker.io:
  images:
    busybox:
      - "1.34"
    bitnami/postgresql:
      - "16.3.0-debian-12-r14"
    dpage/pgadmin4:
      - "8.5"
    grafana/grafana:
      - "11.1.0"
    grafana/promtail:
      - "2.9.3"
    kubernetesui/dashboard-api:
      - "1.7.0"
    kubernetesui/dashboard-auth:
      - "1.1.3"
    kubernetesui/dashboard-metrics-scraper:
      - "1.1.1"
    kubernetesui/dashboard-web:
      - "1.4.0"
    filebrowser/filebrowser:
      - "v2.28.0"
    grafana/loki:
      - "2.6.1"
    kong:
      - "3.6"
ghcr.io:
  images:
    fluxcd/source-controller:
      - "v1.3.0"
    fluxcd/kustomize-controller:
      - "v1.3.0"
    fluxcd/helm-controller:
      - "v1.0.1"
    fluxcd/notification-controller:
      - "v1.3.0"
    weaveworks/wego-app:
      - "v0.38.0"
quay.io:
  images:
    kiwigrid/k8s-sidecar:
      - "1.26.1"
    prometheus-operator/prometheus-config-reloader:
      - "v0.75.1"
    prometheus-operator/prometheus-operator:
      - "v0.75.1"
    prometheus/alertmanager:
      - "v0.27.0"
    prometheus/node-exporter:
      - "v1.8.1"
    prometheus/prometheus:
      - "v2.53.1"
registry.k8s.io:
  images:
    kube-state-metrics/kube-state-metrics:
      - "v2.12.0"
    ingress-nginx/kube-webhook-certgen:
      - "v20221220-controller-v1.5.1-58-g787ea74b6"

The workaround i will use is to, even that will be an "ugly" dockerfile, download and copy all the images in independent layers (by adding multiple RUN and COPY lines , one per image). At least that will reduce the individual layer size. I'll update on that

jimmykarily · 2024-12-02T09:41:44Z

Labeled it as "enhancement". We have the code that improves it, let's see when we can plan it.

github-project-automation bot added this to 🧙Issue tracking board Nov 25, 2024

jimmykarily moved this to In Progress 🏃 in 🧙Issue tracking board Nov 25, 2024

jimmykarily added the enhancement New feature or request label Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auroraboot out of memory for some images #3037

Auroraboot out of memory for some images #3037

jimmykarily commented Nov 25, 2024

jimmykarily commented Nov 26, 2024

davidnajar commented Nov 26, 2024

jimmykarily commented Nov 26, 2024

Itxaka commented Nov 26, 2024

davidnajar commented Nov 26, 2024

Itxaka commented Nov 26, 2024

Itxaka commented Nov 26, 2024

Itxaka commented Nov 26, 2024

Itxaka commented Nov 26, 2024

Itxaka commented Nov 26, 2024

Itxaka commented Nov 27, 2024

davidnajar commented Nov 27, 2024 •

edited

Loading

jimmykarily commented Dec 2, 2024

Auroraboot out of memory for some images #3037

Auroraboot out of memory for some images #3037

Comments

jimmykarily commented Nov 25, 2024

Discussed in #2986

jimmykarily commented Nov 26, 2024

davidnajar commented Nov 26, 2024

jimmykarily commented Nov 26, 2024

Itxaka commented Nov 26, 2024

davidnajar commented Nov 26, 2024

Itxaka commented Nov 26, 2024

Itxaka commented Nov 26, 2024

Itxaka commented Nov 26, 2024

Itxaka commented Nov 26, 2024

Itxaka commented Nov 26, 2024

Itxaka commented Nov 27, 2024

davidnajar commented Nov 27, 2024 • edited Loading

jimmykarily commented Dec 2, 2024

davidnajar commented Nov 27, 2024 •

edited

Loading