Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System scope tasks seem to have problems when XDG_RUNTIME_DIR is set #72

Closed
mhjacks opened this issue Dec 7, 2024 · 9 comments · Fixed by #73
Closed

System scope tasks seem to have problems when XDG_RUNTIME_DIR is set #72

mhjacks opened this issue Dec 7, 2024 · 9 comments · Fixed by #73

Comments

@mhjacks
Copy link
Contributor

mhjacks commented Dec 7, 2024

I ran into some problems with synthesized host records in systemd-resolved, and was using this collection to inject dropins.

Using the latest version, I see consistent hangs at this point:

TASK [fedora.linux_system_roles.systemd : Reload systemd] ******************************************
task path: /home/mjackson/.ansible/collections/ansible_collections/fedora/linux_system_roles/roles/systemd/tasks/main.yml:106
<srv-f41-t3.imladris.lan> ESTABLISH SSH CONNECTION FOR USER: None
<srv-f41-t3.imladris.lan> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/bcb77c72d4"' srv-f41-t3.imladris.lan '/bin/sh -c '"'"'echo ~ && sleep 0'"'"''
<srv-f41-t3.imladris.lan> (0, b'/home/mjackson\n', b'')
<srv-f41-t3.imladris.lan> ESTABLISH SSH CONNECTION FOR USER: None
<srv-f41-t3.imladris.lan> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/bcb77c72d4"' srv-f41-t3.imladris.lan '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo /home/mjackson/.ansible/tmp `"&& mkdir "` echo /home/mjackson/.ansible/tmp/ansible-tmp-1733538959.9283075-90102-130389421889742 `" && echo ansible-tmp-1733538959.9283075-90102-130389421889742="` echo /home/mjackson/.ansible/tmp/ansible-tmp-1733538959.9283075-90102-130389421889742 `" ) && sleep 0'"'"''
<srv-f41-t3.imladris.lan> (0, b'ansible-tmp-1733538959.9283075-90102-130389421889742=/home/mjackson/.ansible/tmp/ansible-tmp-1733538959.9283075-90102-130389421889742\n', b'')
Using module file /usr/lib/python3.13/site-packages/ansible/modules/systemd.py
<srv-f41-t3.imladris.lan> PUT /home/mjackson/.ansible/tmp/ansible-local-898670xmituke/tmpjv3ozbn2 TO /home/mjackson/.ansible/tmp/ansible-tmp-1733538959.9283075-90102-130389421889742/AnsiballZ_systemd.py
<srv-f41-t3.imladris.lan> SSH: EXEC sftp -b - -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/bcb77c72d4"' '[srv-f41-t3.imladris.lan]'
<srv-f41-t3.imladris.lan> (0, b'sftp> put /home/mjackson/.ansible/tmp/ansible-local-898670xmituke/tmpjv3ozbn2 /home/mjackson/.ansible/tmp/ansible-tmp-1733538959.9283075-90102-130389421889742/AnsiballZ_systemd.py\n', b'')
<srv-f41-t3.imladris.lan> ESTABLISH SSH CONNECTION FOR USER: None
<srv-f41-t3.imladris.lan> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/bcb77c72d4"' srv-f41-t3.imladris.lan '/bin/sh -c '"'"'chmod u+x /home/mjackson/.ansible/tmp/ansible-tmp-1733538959.9283075-90102-130389421889742/ /home/mjackson/.ansible/tmp/ansible-tmp-1733538959.9283075-90102-130389421889742/AnsiballZ_systemd.py && sleep 0'"'"''
<srv-f41-t3.imladris.lan> (0, b'', b'')
<srv-f41-t3.imladris.lan> ESTABLISH SSH CONNECTION FOR USER: None
<srv-f41-t3.imladris.lan> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/bcb77c72d4"' -tt srv-f41-t3.imladris.lan '/bin/sh -c '"'"'/usr/bin/python3 /home/mjackson/.ansible/tmp/ansible-tmp-1733538959.9283075-90102-130389421889742/AnsiballZ_systemd.py && sleep 0'"'"''
<srv-f41-t3.imladris.lan> (1, b'\x1b[1;31m==== AUTHENTICATING FOR org.freedesktop.systemd1.reload-daemon ====\r\n\x1b[0mAuthentication is required to reload the systemd state.\r\nMultiple identities can be used for authentication:\r\n 1.  Local Admin (localadmin)\r\n 2.  Martin Jackson (mjackson)\r\nChoose identity to authenticate as (1-2): \r\n{"failed": true, "msg": "failure 1 during daemon-reload: Reload daemon failed: Method call timed out\\n", "invocation": {"module_args": {"daemon_reload": true, "scope": "system", "daemon_reexec": false, "no_block": false, "name": null, "state": null, "enabled": null, "force": null, "masked": null}}}\r\n', b'Shared connection to srv-f41-t3.imladris.lan closed.\r\n')
<srv-f41-t3.imladris.lan> Failed to connect to the host via ssh: Shared connection to srv-f41-t3.imladris.lan closed.
<srv-f41-t3.imladris.lan> ESTABLISH SSH CONNECTION FOR USER: None
<srv-f41-t3.imladris.lan> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/bcb77c72d4"' srv-f41-t3.imladris.lan '/bin/sh -c '"'"'rm -f -r /home/mjackson/.ansible/tmp/ansible-tmp-1733538959.9283075-90102-130389421889742/ > /dev/null 2>&1 && sleep 0'"'"''
<srv-f41-t3.imladris.lan> (0, b'', b'')
failed: [srv-f41-t3.imladris.lan] (item=root) => {
    "ansible_loop_var": "item",
    "changed": false,
    "invocation": {
        "module_args": {
            "daemon_reexec": false,
            "daemon_reload": true,
            "enabled": null,
            "force": null,
            "masked": null,
            "name": null,
            "no_block": false,
            "scope": "system",
            "state": null
        }
    },
    "item": "root",
    "msg": "failure 1 during daemon-reload: Reload daemon failed: Method call timed out\n"
}

PLAY RECAP *****************************************************************************************
srv-f41-t3.imladris.lan    : ok=18   changed=2    unreachable=0    failed=1    skipped=27   rescued=0    ignored=0

These things seem not to happen when XDG_RUNTIME_DIR is not set. (I'm not sure that's the problem, but it kind of looks like that).

Here are the playbook and template I was running: (the unit reload at the end works fine without the extra env var setting):

---
- name: Test playbook
  hosts: all
  gather_facts: true
  tasks:
    - name: Gather service facts
      ansible.builtin.service_facts:

    - name: Fix systemd-resolved if needed
      become: true
      when:
        - ansible_facts['services']['systemd-resolved.service']['status'] is defined
        - ansible_facts['services']['systemd-resolved.service']['status'] == "enabled"
      block:
        - name: Manage systemd for systemd-resolved
          ansible.builtin.include_role:
            name: fedora.linux_system_roles.systemd
          vars:
            systemd_dropins:
              - systemd-resolved.service.conf.j2

        - name: Bounce systemd-resolved
          ansible.builtin.systemd:
            daemon_reload: true
          when: dropin_stat is changed

        - name: Bounce systemd-resolved
          ansible.builtin.service:
            name: systemd-resolved
            state: restarted
          when: dropin_stat is changed

And the template:

[Service]
Environment=SYSTEMD_RESOLVED_SYNTHESIZE_HOSTNAME=0
@mhjacks
Copy link
Contributor Author

mhjacks commented Dec 7, 2024

I've done some more experimentation, removing the setting of XDG_RUNTIME_DIR from the lsr.systemd role. I'm a little surprised to see the polkit authentication; I don't see that in interactive sessions, so I don't understand what's happening. I suspect my IPA domain might have something to do with it too, but not certain there either.

@richm
Copy link
Contributor

richm commented Dec 7, 2024

I've done some more experimentation, removing the setting of XDG_RUNTIME_DIR from the lsr.systemd role. I'm a little surprised to see the polkit authentication; I don't see that in interactive sessions, so I don't understand what's happening. I suspect my IPA domain might have something to do with it too, but not certain there either.

What version of the role are you using? What is the platform and version of the managed node? What version of ansible are you using?

@mhjacks
Copy link
Contributor Author

mhjacks commented Dec 7, 2024

Thanks for responding!

  1. 1.91.0 of fedora.linux_system_roles from Galaxy
  2. Primarily Fedora 41, but I also replicated the behavior on a current CentOS 9-stream node
  3. Packaged Ansible 2.16 from Fedora 41

@mhjacks
Copy link
Contributor Author

mhjacks commented Dec 7, 2024

I tried again with a freshly installed system (i.e. without all my "local customizations", stock Fedora 41 netinst install, upgraded to the lastest as of today and no special auth configuration, with a similar result (localadmin is the starting user with the ability to sudo by virtue of being in the wheel group), got a similar result:

TASK [fedora.linux_system_roles.systemd : Reload systemd] ******************************************
task path: /home/mjackson/.ansible/collections/ansible_collections/fedora/linux_system_roles/roles/systemd/tasks/main.yml:106
<192.168.4.172> ESTABLISH SSH CONNECTION FOR USER: localadmin
<192.168.4.172> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="localadmin"' -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/2e23c0c65c"' 192.168.4.172 '/bin/sh -c '"'"'echo ~localadmin && sleep 0'"'"''
<192.168.4.172> (0, b'/home/localadmin\n', b'')
<192.168.4.172> ESTABLISH SSH CONNECTION FOR USER: localadmin
<192.168.4.172> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="localadmin"' -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/2e23c0c65c"' 192.168.4.172 '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo /home/localadmin/.ansible/tmp `"&& mkdir "` echo /home/localadmin/.ansible/tmp/ansible-tmp-1733595418.2672923-140165-107891766592491 `" && echo ansible-tmp-1733595418.2672923-140165-107891766592491="` echo /home/localadmin/.ansible/tmp/ansible-tmp-1733595418.2672923-140165-107891766592491 `" ) && sleep 0'"'"''
<192.168.4.172> (0, b'ansible-tmp-1733595418.2672923-140165-107891766592491=/home/localadmin/.ansible/tmp/ansible-tmp-1733595418.2672923-140165-107891766592491\n', b'')
Using module file /usr/lib/python3.13/site-packages/ansible/modules/systemd.py
<192.168.4.172> PUT /home/mjackson/.ansible/tmp/ansible-local-1399466vjmnydi/tmpuc7qk8a8 TO /home/localadmin/.ansible/tmp/ansible-tmp-1733595418.2672923-140165-107891766592491/AnsiballZ_systemd.py
<192.168.4.172> SSH: EXEC sftp -b - -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="localadmin"' -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/2e23c0c65c"' '[192.168.4.172]'
<192.168.4.172> (0, b'sftp> put /home/mjackson/.ansible/tmp/ansible-local-1399466vjmnydi/tmpuc7qk8a8 /home/localadmin/.ansible/tmp/ansible-tmp-1733595418.2672923-140165-107891766592491/AnsiballZ_systemd.py\n', b'')
<192.168.4.172> ESTABLISH SSH CONNECTION FOR USER: localadmin
<192.168.4.172> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="localadmin"' -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/2e23c0c65c"' 192.168.4.172 '/bin/sh -c '"'"'chmod u+x /home/localadmin/.ansible/tmp/ansible-tmp-1733595418.2672923-140165-107891766592491/ /home/localadmin/.ansible/tmp/ansible-tmp-1733595418.2672923-140165-107891766592491/AnsiballZ_systemd.py && sleep 0'"'"''
<192.168.4.172> (0, b'', b'')
<192.168.4.172> ESTABLISH SSH CONNECTION FOR USER: localadmin
<192.168.4.172> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="localadmin"' -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/2e23c0c65c"' -tt 192.168.4.172 '/bin/sh -c '"'"'XDG_RUNTIME_DIR=/run/user/0 /usr/bin/python3 /home/localadmin/.ansible/tmp/ansible-tmp-1733595418.2672923-140165-107891766592491/AnsiballZ_systemd.py && sleep 0'"'"''
<192.168.4.172> (1, b'\x1b[1;31m==== AUTHENTICATING FOR org.freedesktop.systemd1.reload-daemon ====\r\n\x1b[0mAuthentication is required to reload the systemd state.\r\nAuthenticating as: Local Admin (localadmin)\r\nPassword: \r\n{"failed": true, "msg": "failure 1 during daemon-reload: Reload daemon failed: Method call timed out\\n", "invocation": {"module_args": {"daemon_reload": true, "scope": "system", "daemon_reexec": false, "no_block": false, "name": null, "state": null, "enabled": null, "force": null, "masked": null}}}\r\n', b'Shared connection to 192.168.4.172 closed.\r\n')
<192.168.4.172> Failed to connect to the host via ssh: Shared connection to 192.168.4.172 closed.
<192.168.4.172> ESTABLISH SSH CONNECTION FOR USER: localadmin
<192.168.4.172> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="localadmin"' -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/2e23c0c65c"' 192.168.4.172 '/bin/sh -c '"'"'rm -f -r /home/localadmin/.ansible/tmp/ansible-tmp-1733595418.2672923-140165-107891766592491/ > /dev/null 2>&1 && sleep 0'"'"''
<192.168.4.172> (0, b'', b'')
failed: [192.168.4.172] (item=root) => {
    "ansible_loop_var": "item",
    "changed": false,
    "invocation": {
        "module_args": {
            "daemon_reexec": false,
            "daemon_reload": true,
            "enabled": null,
            "force": null,
            "masked": null,
            "name": null,
            "no_block": false,
            "scope": "system",
            "state": null
        }
    },
    "item": "root",
    "msg": "failure 1 during daemon-reload: Reload daemon failed: Method call timed out\n"
}

@mhjacks
Copy link
Contributor Author

mhjacks commented Dec 7, 2024

I was able to run this successfully (against an f41 node and a centos-10 stream node) with ansible_user=root (as opposed to going in as another user and elevating), using an entry in authorized_keys. Here's the output for the reload task at -vvv:

task path: /home/mjackson/.ansible/collections/ansible_collections/fedora/linux_system_roles/roles/systemd/tasks/main.yml:106
<srv-kea-t.imladris.lan> ESTABLISH SSH CONNECTION FOR USER: root
<srv-kea-t.imladris.lan> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/fd3a0fc429"' srv-kea-t.imladris.lan '/bin/sh -c '"'"'echo ~root && sleep 0'"'"''
<srv-kea-t.imladris.lan> (0, b'/root\n', b'')
<srv-kea-t.imladris.lan> ESTABLISH SSH CONNECTION FOR USER: root
<srv-kea-t.imladris.lan> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/fd3a0fc429"' srv-kea-t.imladris.lan '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir "` echo /root/.ansible/tmp/ansible-tmp-1733606172.0344079-162252-52835113562310 `" && echo ansible-tmp-1733606172.0344079-162252-52835113562310="` echo /root/.ansible/tmp/ansible-tmp-1733606172.0344079-162252-52835113562310 `" ) && sleep 0'"'"''
<srv-kea-t.imladris.lan> (0, b'ansible-tmp-1733606172.0344079-162252-52835113562310=/root/.ansible/tmp/ansible-tmp-1733606172.0344079-162252-52835113562310\n', b'')
Using module file /usr/lib/python3.13/site-packages/ansible/modules/systemd.py
<srv-kea-t.imladris.lan> PUT /home/mjackson/.ansible/tmp/ansible-local-162039sjqi5lfs/tmpzls1ro2f TO /root/.ansible/tmp/ansible-tmp-1733606172.0344079-162252-52835113562310/AnsiballZ_systemd.py
<srv-kea-t.imladris.lan> SSH: EXEC sftp -b - -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/fd3a0fc429"' '[srv-kea-t.imladris.lan]'
<srv-kea-t.imladris.lan> (0, b'sftp> put /home/mjackson/.ansible/tmp/ansible-local-162039sjqi5lfs/tmpzls1ro2f /root/.ansible/tmp/ansible-tmp-1733606172.0344079-162252-52835113562310/AnsiballZ_systemd.py\n', b'')
<srv-kea-t.imladris.lan> ESTABLISH SSH CONNECTION FOR USER: root
<srv-kea-t.imladris.lan> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/fd3a0fc429"' srv-kea-t.imladris.lan '/bin/sh -c '"'"'chmod u+x /root/.ansible/tmp/ansible-tmp-1733606172.0344079-162252-52835113562310/ /root/.ansible/tmp/ansible-tmp-1733606172.0344079-162252-52835113562310/AnsiballZ_systemd.py && sleep 0'"'"''
<srv-kea-t.imladris.lan> (0, b'', b'')
<srv-kea-t.imladris.lan> ESTABLISH SSH CONNECTION FOR USER: root
<srv-kea-t.imladris.lan> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/fd3a0fc429"' -tt srv-kea-t.imladris.lan '/bin/sh -c '"'"'XDG_RUNTIME_DIR=/run/user/0 /usr/bin/python3 /root/.ansible/tmp/ansible-tmp-1733606172.0344079-162252-52835113562310/AnsiballZ_systemd.py && sleep 0'"'"''
<srv-kea-t.imladris.lan> (0, b'\r\n{"name": null, "changed": false, "status": {}, "invocation": {"module_args": {"daemon_reload": true, "scope": "system", "daemon_reexec": false, "no_block": false, "name": null, "state": null, "enabled": null, "force": null, "masked": null}}}\r\n', b'Shared connection to srv-kea-t.imladris.lan closed.\r\n')
<srv-kea-t.imladris.lan> ESTABLISH SSH CONNECTION FOR USER: root
<srv-kea-t.imladris.lan> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o 'ControlPath="/home/mjackson/.ansible/cp/fd3a0fc429"' srv-kea-t.imladris.lan '/bin/sh -c '"'"'rm -f -r /root/.ansible/tmp/ansible-tmp-1733606172.0344079-162252-52835113562310/ > /dev/null 2>&1 && sleep 0'"'"''
<srv-kea-t.imladris.lan> (0, b'', b'')
ok: [srv-kea-t.imladris.lan] => (item=root) => {
    "ansible_loop_var": "item",
    "changed": false,
    "invocation": {
        "module_args": {
            "daemon_reexec": false,
            "daemon_reload": true,
            "enabled": null,
            "force": null,
            "masked": null,
            "name": null,
            "no_block": false,
            "scope": "system",
            "state": null
        }
    },
    "item": "root",
    "name": null,
    "status": {}
}

@mhjacks
Copy link
Contributor Author

mhjacks commented Dec 9, 2024

I have spent some more time with this, and I see that the current galaxy linux-system-roles seems to lag the latest version of this repo by a bit. I have gotten the things I want to get working, working, by making the following changes (to the lsr version, but I think the same things will apply here):

  1. Add user_id to the fact subset.
  2. Set become: true regardless of user. (For some reason even when ansible_user_id is root, when I initially connected as a nonprivlieged user, I got the command timeout).
  3. become_user -> become_user: "{{ item if item != ansible_user_id else omit }}"

My reasoning for this is that it is conceivable that you would connect as a non-root user, and want to manage both non-root units and root units in the same play. By checking user_id you can see what you've connected as. (Though I think there might be a problem here if you're connected as a non-privleged user, expect to escalate to root to manage system units, but become comes out as "false" because the user in the dict is root but you're not necessarily connected as root). I wonder if it would be OK to always set become_user; it would be a proper error I think if the ansible user is unable to become any of the users that want to manage and in general I think users can become themselves.

I will work on submitting a proper PR. My hope is that this doesn't screw anything up for older versions of RHEL that we have to support.

@richm
Copy link
Contributor

richm commented Dec 9, 2024

I have spent some more time with this, and I see that the current galaxy linux-system-roles seems to lag the latest version of this repo by a bit.

It should not, at least for "real" fixes and features. There are a lot of commits related to testing, ci, etc. that might not be in the published role or collection, but all fixes and features in the Galaxy published code should be up-to-date.

I have gotten the things I want to get working, working, by making the following changes (to the lsr version, but I think the same things will apply here):

1. Add `user_id` to the fact subset.

2. Set `become: true` regardless of user. (For some reason even when ansible_user_id is root, when I initially connected as a nonprivlieged user, I got the command timeout).

3. become_user -> `become_user: "{{ item if item != ansible_user_id else omit }}"`

My reasoning for this is that it is conceivable that you would connect as a non-root user, and want to manage both non-root units and root units in the same play. By checking user_id you can see what you've connected as. (Though I think there might be a problem here if you're connected as a non-privleged user, expect to escalate to root to manage system units, but become comes out as "false" because the user in the dict is root but you're not necessarily connected as root). I wonder if it would be OK to always set become_user; it would be a proper error I think if the ansible user is unable to become any of the users that want to manage and in general I think users can become themselves.

I will work on submitting a proper PR. My hope is that this doesn't screw anything up for older versions of RHEL that we have to support.

@mhjacks
Copy link
Contributor Author

mhjacks commented Dec 9, 2024

Thanks for looking at this!

Yeah, the differences were not enormous. Mostly formatting and a couple of things like that. The comments I made here predate the PR.

As an update: the PR sets become: true unconditionally, and likewise sets the become_user to the target user for the unit or dropin unconditionally.

I was able to test this on F41, on CentOS 9-stream, and on a fresh almalinux 8 in my homelab. I am unaware of other potential pitfalls. I think generally:

  1. becoming a user you already are should work in most situations
  2. It should be safe to become another unprivileged user. If you cannot in fact do that, the role will fail, as I think it should

There seem to be a number of potential gotchas here, though, so I completely understand caution. :)

@richm
Copy link
Contributor

richm commented Dec 9, 2024

The design philosophy for the system roles is:

  • the user should be able to control the default become behavior - hard-coding become: true defeats that
  • almost everything a system role does requires root - installing packages, managing system services, managing system config files in /etc[1] - so it is expected that users will typically run ansible with the root user on the manage nodes, or otherwise run ansible in such a way that the result is if all tasks had been run with become: true and become_user: root.

The proposed PR goes against that, but from what you have reported, it seems like the right thing to do.

Note that I borrowed this implementation from the podman system role quadlet support, and that has been used extensively to manage user quadlets, so I'm surprised we haven't seen this issue there.

[1] I realize that there are ways around that for determined/savvy users, but that is far from typical

@richm richm closed this as completed in #73 Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants