Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

24.10, 25.04 and devel images fail to launch due to systemd-resolved issue #598

Open
mr-cal opened this issue Jul 19, 2024 · 9 comments
Open
Labels
Bug Something isn't working triaged We will be doing this

Comments

@mr-cal
Copy link
Collaborator

mr-cal commented Jul 19, 2024

Bug Description

ubuntu.BuilddBaseAlias.ORACULAR and ubuntu.BuilddBaseAlias.DEVEL fail when launching a LXD container due to a problem with the buildd images and systemd-resolved.

Same as canonical/snapcraft#4921

To Reproduce

Run script below:

part yaml

#! /usr/bin/env python3

from pathlib import Path
import logging

from craft_providers import bases, lxd

logging.basicConfig(level="DEBUG")

provider = lxd.LXDProvider(lxd_project="project1")

provider.ensure_provider_is_available()

alias = bases.BuilddBaseAlias.ORACULAR
my_base = bases.BuilddBase(alias=alias)

with provider.launched_environment(
    project_name = "hello-world",
    project_path = Path().absolute(),
    base_configuration=my_base,
    build_base=alias.value,
    instance_name = "test-instance",
    allow_unstable=True,
) as instance:
    instance.execute_run(["ls"])

Relevant log output

Failed to setup systemd-resolved.
* Command that failed: 'lxc --project snapcraft exec local:base-instance-snapcraft-buildd-base-v7-c-9d4e2b684569fe719a0d -- env CRAFT_MANAGED_MODE=1 SNAPCRAFT_BUILD_INFO=1 DEBIAN_FRONTEND=noninteractive DEBCONF_NONINTERACTIVE_SEEN=true DEBIAN_PRIORITY=critical systemctl restart systemd-resolved'
* Command exit code: 1
* Command standard error output: b'Job for systemd-resolved.service failed because the control process exited with error code.\nSee "systemctl status systemd-resolved.service" and "journalctl -xeu systemd-resolved.service" for details.\n'

...

systemctl status systemd-resolved.service
× systemd-resolved.service - Network Name Resolution
     Loaded: loaded (/usr/lib/systemd/system/systemd-resolved.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Thu 2024-07-11 14:57:45 UTC; 37s ago
 Invocation: fe747a54d0294da4af991ec064327654
       Docs: man:systemd-resolved.service(8)
             man:org.freedesktop.resolve1(5)
             https://systemd.io/WRITING_NETWORK_CONFIGURATION_MANAGERS
             https://systemd.io/WRITING_RESOLVER_CLIENTS
   Main PID: 99 (code=exited, status=243/CREDENTIALS)
@mr-cal mr-cal added the Bug Something isn't working label Jul 19, 2024
lengau added a commit to canonical/charmcraft that referenced this issue Jul 22, 2024
Ubuntu Mantic is now EOL, causing launches to fail. Unfortunately, we
cannot replace this with oracular due to
canonical/craft-providers#598
lengau added a commit to canonical/charmcraft that referenced this issue Jul 22, 2024
Ubuntu Mantic is now EOL, causing launches to fail. Unfortunately, we
cannot replace this with oracular due to
canonical/craft-providers#598

Fix for Oracular is scheduled in
#1748

Spread test failure is fixed in #1744
@mr-cal
Copy link
Collaborator Author

mr-cal commented Nov 15, 2024

This is also failing in 25.04

@mr-cal mr-cal added the triaged We will be doing this label Nov 15, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/CRAFT-3684.

This message was autogenerated

@mr-cal mr-cal changed the title Oracular and devel images fails due to systemd-resolved issue in Ubuntu 24.10 Oracular, Plucky and devel images fails due to systemd-resolved issue in Ubuntu 24.10 Nov 15, 2024
@mr-cal mr-cal changed the title Oracular, Plucky and devel images fails due to systemd-resolved issue in Ubuntu 24.10 24.10, 25.04 and devel images fail to launch due to systemd-resolved issue Nov 15, 2024
@dariuszd21
Copy link
Contributor

dariuszd21 commented Nov 15, 2024

Just as a side note in my 25.04 test, systemd-resolved service is not failing anymore. Container just does not receive any IPv4 address.
Normal oracular LXD containers work just fine, so it's something with the buildd version that makes their connectivity broken

In the working container systemd-networkd service uses configuration from /run/systemd/network which I couldn't find in the buildd container

Nov 15 16:16:29 present-eagle systemd[1]: Started systemd-networkd.service - Network Configuration.
Nov 15 16:16:29 present-eagle systemd-networkd[322]: eth0: Configuring with /run/systemd/network/10-netplan-eth0.network.

@mr-cal
Copy link
Collaborator Author

mr-cal commented Nov 15, 2024

Interesting, is systemd-networkd present in the buildd image?

Here's some context on the last time something similar happened: https://bugs.launchpad.net/cloud-images/+bug/2007419

@dariuszd21
Copy link
Contributor

dariuszd21 commented Nov 15, 2024

It's there but it seems that it's not picking any configuration (the file mentioned above does not exist, neither /run/systemd/network directory):

root@testcraft-hello-on-amd64-for-amd64-9111675:~# systemctl status systemd-networkd
● systemd-networkd.service - Network Configuration
     Loaded: loaded (/usr/lib/systemd/system/systemd-networkd.service; enabled; preset: enabled)
     Active: active (running) since Fri 2024-11-15 16:34:14 UTC; 7min ago
 Invocation: 9c4699527d9c4bf1a22c001c394e973b
TriggeredBy: ● systemd-networkd.socket
       Docs: man:systemd-networkd.service(8)
             man:org.freedesktop.network1(5)
   Main PID: 461 (systemd-network)
     Status: "Processing requests..."
      Tasks: 1 (limit: 37588)
   FD Store: 0 (limit: 512)
     Memory: 1.5M (peak: 2.1M)
        CPU: 44ms
     CGroup: /system.slice/systemd-networkd.service
             └─461 /usr/lib/systemd/systemd-networkd

Nov 15 16:34:14 testcraft-hello-on-amd64-for-amd64-9111675 systemd[1]: Starting systemd-networkd.service - Network Configuration...
Nov 15 16:34:14 testcraft-hello-on-amd64-for-amd64-9111675 systemd-networkd[461]: Failed to increase receive buffer size for general netlink socket, ignoring: Operation not permitted
Nov 15 16:34:14 testcraft-hello-on-amd64-for-amd64-9111675 systemd-networkd[461]: lo: Link UP
Nov 15 16:34:14 testcraft-hello-on-amd64-for-amd64-9111675 systemd-networkd[461]: lo: Gained carrier
Nov 15 16:34:14 testcraft-hello-on-amd64-for-amd64-9111675 systemd-networkd[461]: eth0: Link UP
Nov 15 16:34:14 testcraft-hello-on-amd64-for-amd64-9111675 systemd-networkd[461]: eth0: Gained carrier
Nov 15 16:34:14 testcraft-hello-on-amd64-for-amd64-9111675 systemd-networkd[461]: eth0: Gained IPv6LL
Nov 15 16:34:14 testcraft-hello-on-amd64-for-amd64-9111675 systemd-networkd[461]: Enumeration completed
Nov 15 16:34:14 testcraft-hello-on-amd64-for-amd64-9111675 systemd[1]: Started systemd-networkd.service - Network Configuration.

@cmatsuoka
Copy link
Collaborator

cmatsuoka commented Dec 17, 2024

I tested the Oracular image and it's created with eth0 unconfigured and no preinstalled networking configuration tools. After installing iproute2 manually and adding an ipv4 address to the interface it seems to work as expected? Update: networking is configured by systemd-networkd directly without using regular ip tools.

Interestingly enough, the noble image doesn't have those packages installed as well and it works, so the solution lies somewhere else.

@cmatsuoka
Copy link
Collaborator

Updated reproducer script:

#! /usr/bin/env python3

from pathlib import Path
import logging

from craft_providers import bases, lxd

logging.basicConfig(level="DEBUG")

provider = lxd.LXDProvider(lxd_project="project1")

provider.ensure_provider_is_available()

my_base = bases.BuilddBase(alias=bases.BuilddBaseAlias.ORACULAR)

with provider.launched_environment(
    project_name = "hello-world",
    project_path = Path().absolute(),
    base_configuration=my_base,
    instance_name = "test-instance",
    allow_unstable=True,
) as instance:
    instance.execute_run(["ls", "-l", "/"])

@cmatsuoka
Copy link
Collaborator

cmatsuoka commented Dec 18, 2024

Package udev is missing from the oracular and plucky buildd images, preventing network interfaces to be configured by systemd-networkd. The following patch is a proof of concept that injects the missing packages into the base image and allow the network to be properly configured:

diff --git a/craft_providers/base.py b/craft_providers/base.py
index f2bd0ef..48f6a84 100644
--- a/craft_providers/base.py
+++ b/craft_providers/base.py
@@ -1130,6 +1130,7 @@ class Base(ABC):
         capture_output: bool = True,
         text: bool = False,
         timeout: Optional[float] = None,
+        cwd: Optional[pathlib.PurePath] = None,
         verify_network=False,
     ) -> subprocess.CompletedProcess:
         """Run a command through the executor.
@@ -1153,6 +1154,7 @@ class Base(ABC):
                 check=check,
                 capture_output=capture_output,
                 text=text,
+                cwd=cwd,
                 timeout=timeout,
             )
         except subprocess.CalledProcessError as exc:
diff --git a/craft_providers/bases/ubuntu.py b/craft_providers/bases/ubuntu.py
index 60bcd71..bef25e0 100644
--- a/craft_providers/bases/ubuntu.py
+++ b/craft_providers/bases/ubuntu.py
@@ -186,6 +186,35 @@ class BuilddBase(Base):
                 )
             )
 
+    def _setup_os(self, executor: Executor) -> None:
+        missing_debs = [
+            "libkmod2_31+20240202-2ubuntu7_amd64.deb",
+            "libudev1_255.4-1ubuntu8.4_amd64.deb",
+            "systemd-dev_255.4-1ubuntu8.4_all.deb",
+            "udev_255.4-1ubuntu8.4_amd64.deb",
+        ]
+
+        for deb in missing_debs:
+            executor.push_file(
+                source=pathlib.Path(deb),
+                destination=pathlib.PurePath("/root", deb),
+            )
+        self._execute_run(
+            command=["dpkg", "-i"] + missing_debs,
+            cwd=pathlib.PurePath("/root"),
+            executor=executor,
+            check=True,
+        )
+        self._execute_run(
+            command=["systemctl", "restart", "systemd-udevd"],
+            executor=executor,
+        )
+        self._execute_run(
+            command=["udevadm", "trigger"],
+            executor=executor,
+            check=False,
+        )
+
     def _post_setup_os(self, executor: Executor) -> None:
         """Ubuntu specific post-setup OS tasks."""
         self._disable_automatic_apt(executor=executor)

@cmatsuoka
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working triaged We will be doing this
Projects
None yet
Development

No branches or pull requests

3 participants