Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNIEnv concurrency issues #3556

Open
apostasie opened this issue Oct 16, 2024 · 6 comments
Open

CNIEnv concurrency issues #3556

apostasie opened this issue Oct 16, 2024 · 6 comments
Labels
area/network bug Something isn't working

Comments

@apostasie
Copy link
Contributor

apostasie commented Oct 16, 2024

Description

Although #3491 and #3522 have fixed a lot of cases where CNI would fail because of concurrent access, there are still cases where this happens.

Here, on container create - but very likely everywhere else we manipulate CNIEnv.

We can continue playing wack-a-mole on this and fix every occurrence piece-meal, though it seems like rewriting CNIEnv in a safe way would be a better approach at this point.

The fundamental problems are:

  • we rely on CNI implementation
    • not safe wrt concurrency, as it is walking dirs without a lock
    • writes are not atomic, leaving systems in broken / inconsistent states
  • we do lock in some places, but not everywhere, as this is an afterthought and not part of the design
  • we have complicated code, with private methods calling public ones, further complicated enforcing locking
  • we unnecessarily walk the directory repeatedly during the same flow
  • some of the logic currently in pkg/cmd should really be part of the methods of CNIEnv

Steps to reproduce the issue

FAIL: cmd/nerdctl/network TestNetworkCreate/with_MTU (0.17s)
    network_create_linux_test.go:108: ======================== Pre-test cleanup ========================
    command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworkcreate-with-mtu-1b256b01
    network_create_linux_test.go:108: ======================== Test setup ========================
    command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test network create testnetworkcreate-with-mtu-1b256b01 --driver bridge --opt com.docker.network.driver.mtu=9216
    network_create_linux_test.go:108: ======================== Test Run ========================
    command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test run --rm --net testnetworkcreate-with-mtu-1b256b01 ghcr.io/stargz-containers/alpine:3.13-org ifconfig eth0
    command.go:112: assertion failed: expect.ExitCode is not result.ExitCode: Expected exit code: 0
        
        Command:  /usr/local/bin/nerdctl --namespace=nerdctl-test run --rm --net testnetworkcreate-with-mtu-1b256b01 ghcr.io/stargz-containers/alpine:3.13-org ifconfig eth0
        ExitCode: 1
        Error:    exit status 1
        Stdout:   
        Stderr:   time="2024-10-16T18:45:12Z" level=fatal msg="failed to verify networking settings: failed to check for default network: error reading /etc/cni/net.d/nerdctl-test/nerdctl-testnetworklsfilter-1-d946011b.conflist: open /etc/cni/net.d/nerdctl-test/nerdctl-testnetworklsfilter-1-d946011b.conflist: no such file or directory"
        
        Env:
        HOSTNAME=dc5da5d26f5d
        MEMORY_PRESSURE_WRITE=c29tZSAyMDAwMDAgMjAwMDAwMAA=
        SYSTEMD_EXEC_PID=80
        container=docker
        HOME=/root
        LANG=C.UTF-8
        MEMORY_PRESSURE_WATCH=/sys/fs/cgroup/system.slice/docker-entrypoint.service/memory.pressure
        INVOCATION_ID=3d1d502413d2454da8a8a340e78b0311
        TERM=xterm
        USER=root
        SHLVL=3
        CGO_ENABLED=0
        _=/usr/local/bin/gotestsum
        PATH=/usr/local/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
        ***
        DOCKER_CONFIG=/tmp/TestNetworkCreatewith_MTU2150649351/001
        NERDCTL_TOML=/tmp/TestNetworkCreatewith_MTU2150649351/001/nerdctl.toml
    case.go:164: ======================== Post-test cleanup ========================
    command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworkcreate-with-mtu-1b256b01

Describe the results you received and expected

https://github.com/containerd/nerdctl/actions/runs/11371804119/job/31634685012?pr=3555#step:6:1674

What version of nerdctl are you using?

main

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

No response

@apostasie apostasie added the kind/unconfirmed-bug-claim Unconfirmed bug claim label Oct 16, 2024
@AkihiroSuda AkihiroSuda added bug Something isn't working area/network and removed kind/unconfirmed-bug-claim Unconfirmed bug claim labels Oct 17, 2024
@apostasie
Copy link
Contributor Author

apostasie commented Oct 18, 2024

Interesting variant:

https://github.com/containerd/nerdctl/actions/runs/11397513696/job/31713087686?pr=3535#step:6:496

level=fatal msg="failed to verify networking settings: failed to check for default network: error parsing configuration list: unexpected end of JSON input"

I would say this one ^ is a case of interrupted write - or competing write.

@kbrierly
Copy link

kbrierly commented Nov 9, 2024

I'm not sure if this applies to this issue but thought i should comment in case it does. I retried when 2.0.0 was released and the problem continued from 2.0.0-rc3. The same compose file work without issue in docker compose.

I am having CNI issues when using multiple networks. Initially one was macvlan and the other is bridge. I've also attempted dual macvlan. What happens looks like ordering issues. Sometimes the bridge interface would try and come up as a macvlan for example or as below the macvlan tries to come up as a bridge . I made a basic test compose file with hello-world and it happens there are well. Nothing is currently assigned the ip. I've tried other unused ip's as well.

This is the result from a "nerdctl compose up". The home.local is macvlan, proxy.home.local is a bridge. Single network services start with no issues.

# nerdctl compose up  
INFO[0000] Ensuring image hello-world                   
INFO[0000] Creating container test                      
FATA[0003] failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: time="2024-11-09T11:06:52-06:00" level=fatal msg="failed to call cni.Setup: plugin type=\"bridge\" failed (add): failed to allocate all requested IPs: 10.0.0.171": unknown 
FATA[0003] error while creating container test: exit status 1

The hello-world compose file:

services:
  hello:
    image: hello-world
    container_name: test
    hostname: test
    networks:
      home.local:
        ipv4_address: 10.0.0.171
        mac_address: 02:42:0a:00:01:47
      proxy.home.local:
          ipv4_address: 10.100.100.127
     
networks:
    home.local:
        name: home.local
        external: true
    proxy.home.local:
        name: proxy.home.local
        external: true

Please let me know if you need further information.
Thanks.

@apostasie
Copy link
Contributor Author

Thanks @kbrierly

Can you share the exact command line you used to create these networks?

@kbrierly
Copy link

kbrierly commented Nov 9, 2024

# nerdctl network create -d macvlan --subnet=10.0.0.0/24 --gateway=10.0.0.1 -o parent=bond0 -o macvlan_mode=bridge home.local
# nerdctl network create --subnet=10.100.100.0/24 proxy.home.local

# /opt/cni/bin/macvlan -v
CNI macvlan plugin v1.6.0
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0, 1.1.0
# /opt/cni/bin/bridge -v
CNI bridge plugin v1.6.0
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0, 1.1.0

@apostasie
Copy link
Contributor Author

Thanks.
Will look into this.
Do you mind opening a new issue with these details?

@kbrierly
Copy link

(#3663) Created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/network bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants