Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVE 7.x support #155

Merged
merged 4 commits into from
Oct 19, 2021
Merged

PVE 7.x support #155

merged 4 commits into from
Oct 19, 2021

Conversation

zenntrix
Copy link
Collaborator

@zenntrix zenntrix commented Sep 7, 2021

No description provided.

@zenntrix zenntrix requested review from lae and trickert76 September 7, 2021 18:07
Copy link
Owner

@lae lae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per #143

Also with PVE 7, I think we can sunset support for PVE 5/Debian Stretch since they're EOL (but I don't think this means much code cleanup like we had with PVE 4).

Can you remove all the PVE 5 stuff as well? Otherwise it looks fine to me.

README.md Outdated Show resolved Hide resolved
library/ceph_volume.py Outdated Show resolved Hide resolved
tasks/pve_add_node.yml Outdated Show resolved Hide resolved
@lae lae changed the title Feature/pve7 PVE 7.x support Sep 8, 2021
defaults/main.yml Show resolved Hide resolved
@lae lae mentioned this pull request Sep 8, 2021
@lae lae linked an issue Sep 8, 2021 that may be closed by this pull request
@zenntrix
Copy link
Collaborator Author

Thanks @trickert76 for making those changes this morning, apologies i have had a very busy month so far. I will finish this PR off this week so that it can be merged.

@zenntrix zenntrix requested a review from lae October 13, 2021 18:36
@zenntrix zenntrix self-assigned this Oct 13, 2021
tasks/ceph.yml Show resolved Hide resolved
Copy link
Owner

@lae lae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, LGTM and nice job. If anyone can easily do so currently, can someone test out these final set of changes by doing one of the following?

a) deploy both a Debian Bullseye 3-node cluster and a Debian Buster 3-node cluster on physical hardware with Ceph
b) use vagrant to do the above instead, if anyone already has a vagrant setup using libvirt.

The Vagrantfile would need to be updated to use a bullseye image (probably debian/bullseye64 exists) to test a PVE 7 deployment. Don't think anything else would need to be changed.

I'd do this myself right now if I could but I recently replaced my workstation setup to actually use Proxmox as the host OS and my desktop/development OS is a VM guest on it that doesn't have vagrant/libvirt/networking properly configured to be able to use this role's Vagrantfile yet (but maybe I can try to figure that out soon). Speaking of, would be nice if there was a Vagrant provider for Proxmox that was still maintained... (I couldn't find any last time I looked, which was relatively recently I think)

@lae
Copy link
Owner

lae commented Oct 17, 2021

Okay, I got around to setting up libvirt/vagrant in my new setup.

The Create Ceph OSDs task is failing for PVE 6.x deployments:

TASK [lae.proxmox : Create Ceph OSDs] ******************************************
failed: [pve-3] (item={'device': '/dev/vdb'}) => {
    "ansible_loop_var": "item",
    "changed": true,
    "cmd": [
        "ceph-volume",
        "--cluster",
        "ceph",
        "lvm",
        "create",
        "--bluestore",
        "--data",
        "/dev/vdb"
    ],
    "delta": "0:00:00.617953",
    "end": "2021-10-17 04:36:38.607766",
    "item": {
        "device": "/dev/vdb"
    },
    "rc": 1,
    "start": "2021-10-17 04:36:37.989813"
}

STDERR:

Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new b85a97b5-af75-4be8-a024-7e3ccc5f722a
 stderr: 2021-10-17 04:36:38.576 7fee3618d700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
 stderr: 2021-10-17 04:36:38.576 7fee3618d700 -1 AuthRegistry(0x7fee30041248) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: 2021-10-17 04:36:38.580 7fee3618d700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
 stderr: 2021-10-17 04:36:38.580 7fee3618d700 -1 AuthRegistry(0x7fee30041248) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2021-10-17 04:36:38.584 7fee3618d700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
 stderr: 2021-10-17 04:36:38.584 7fee3618d700 -1 AuthRegistry(0x7fee300de6f0) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2021-10-17 04:36:38.584 7fee3618d700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
 stderr: 2021-10-17 04:36:38.584 7fee3618d700 -1 AuthRegistry(0x7fee3618bf38) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: [errno 2] error connecting to the cluster
-->  RuntimeError: Unable to create a new OSD id


MSG:

non-zero return code
failed: [pve-2] (item={'device': '/dev/vdb'}) => {
    "ansible_loop_var": "item",
    "changed": true,
    "cmd": [
        "ceph-volume",
        "--cluster",
        "ceph",
        "lvm",
        "create",
        "--bluestore",
        "--data",
        "/dev/vdb"
    ],
    "delta": "0:00:00.655541",
    "end": "2021-10-17 04:36:39.544418",
    "item": {
        "device": "/dev/vdb"
    },
    "rc": 1,
    "start": "2021-10-17 04:36:38.888877"
}

STDERR:

Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 098f1de3-accc-4165-b227-6792cb22cf20
 stderr: 2021-10-17 04:36:39.520 7f64baaee700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
 stderr: 2021-10-17 04:36:39.520 7f64baaee700 -1 AuthRegistry(0x7f64b4041248) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: 2021-10-17 04:36:39.524 7f64baaee700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
 stderr: 2021-10-17 04:36:39.524 7f64baaee700 -1 AuthRegistry(0x7f64b4041248) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2021-10-17 04:36:39.524 7f64baaee700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
 stderr: 2021-10-17 04:36:39.524 7f64baaee700 -1 AuthRegistry(0x7f64b40de6f0) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2021-10-17 04:36:39.524 7f64baaee700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
 stderr: 2021-10-17 04:36:39.524 7f64baaee700 -1 AuthRegistry(0x7f64baaecf38) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: [errno 2] error connecting to the cluster
-->  RuntimeError: Unable to create a new OSD id


MSG:

non-zero return code
failed: [pve-1] (item={'device': '/dev/vdb'}) => {
    "ansible_loop_var": "item",
    "changed": true,
    "cmd": [
        "ceph-volume",
        "--cluster",
        "ceph",
        "lvm",
        "create",
        "--bluestore",
        "--data",
        "/dev/vdb"
    ],
    "delta": "0:00:00.643911",
    "end": "2021-10-17 04:36:39.740513",
    "item": {
        "device": "/dev/vdb"
    },
    "rc": 1,
    "start": "2021-10-17 04:36:39.096602"
}

STDERR:

Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 73d8df81-3f06-4927-9408-69fa03c49b76
 stderr: 2021-10-17 04:36:39.712 7f5b21d5a700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
 stderr: 2021-10-17 04:36:39.712 7f5b21d5a700 -1 AuthRegistry(0x7f5b1c041238) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: 2021-10-17 04:36:39.716 7f5b21d5a700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
 stderr: 2021-10-17 04:36:39.716 7f5b21d5a700 -1 AuthRegistry(0x7f5b1c041238) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2021-10-17 04:36:39.720 7f5b21d5a700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
 stderr: 2021-10-17 04:36:39.720 7f5b21d5a700 -1 AuthRegistry(0x7f5b1c0def70) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2021-10-17 04:36:39.720 7f5b21d5a700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
 stderr: 2021-10-17 04:36:39.720 7f5b21d5a700 -1 AuthRegistry(0x7f5b21d58f38) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: [errno 2] error connecting to the cluster
-->  RuntimeError: Unable to create a new OSD id


MSG:

non-zero return code

Haven't looked into it yet, but seems like a serious issue? Going to attempt a PVE 7 deployment in a bit and see if the same occurs.

Also these are the only files that exist at that point in the deployment:

~$ sudo ls /etc/pve/priv/
acme  authkey.key  authorized_keys  ceph.client.admin.keyring  ceph.mon.keyring  known_hosts  lock  pve-root-ca.key  pve-root-ca.srl

@lae
Copy link
Owner

lae commented Oct 17, 2021

PVE 7.x deployment first attempt (took a bit longer because I was hitting disk space issues lol):

TASK [lae.proxmox : Install Proxmox VE and related packages] *******************
FAILED - RETRYING: Install Proxmox VE and related packages (2 retries left).
FAILED - RETRYING: Install Proxmox VE and related packages (1 retries left).
FAILED - RETRYING: Install Proxmox VE and related packages (2 retries left).
FAILED - RETRYING: Install Proxmox VE and related packages (2 retries left).
fatal: [pve-2]: FAILED! => {
    "attempts": 2,
    "cache_update_time": 1634449986,
    "cache_updated": false,
    "changed": false,
    "rc": 100
}

STDOUT:

Reading package lists...
Building dependency tree...
Reading state information...
proxmox-ve is already the newest version (7.0-2).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
3 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Setting up ifupdown2 (3.1.0-1+pmx3) ...

network config changes have been detected for ifupdown2 compatibility.
Saved in /etc/network/interfaces.new for hot-apply or next reboot.

Reloading network config on first install
error: /etc/network/interfaces: line3: error processing line 'source-directory /etc/network/interfaces.d'
dpkg: error processing package ifupdown2 (--configure):
 installed ifupdown2 package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of pve-manager:
 pve-manager depends on ifupdown2 (>= 2.0.1-1+pve8) | ifenslave (>= 2.6); however:
  Package ifupdown2 is not configured yet.
  Package ifenslave is not installed.

dpkg: error processing package pve-manager (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of proxmox-ve:
 proxmox-ve depends on pve-manager; however:
  Package pve-manager is not configured yet.

dpkg: error processing package proxmox-ve (--configure):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 ifupdown2
 pve-manager
 proxmox-ve



STDERR:

E: Sub-process /usr/bin/dpkg returned an error code (1)



MSG:

'/usr/bin/apt-get -y -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"       install 'proxmox-ve'' failed: E: Sub-process /usr/bin/dpkg returned an error code (1)

FAILED - RETRYING: Install Proxmox VE and related packages (1 retries left).

Seems like the source-directory stanza from /etc/network/interfaces needs to be removed (for Debian 11/PVE 7 only probably):

# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d

Dunno what we should do about this one—I assume most people are configuring /etc/network/interfaces by hand in the first place so really this probably just affects the vagrant environment, and we can probably just update the vagrant playbook to remove that line. Or should we remove or warn about any source-directory lines ourselves?

Anyway, after manually fixing it for testing, it looks like the Ceph OSD task is still failing for PVE 7, with the same issue:

TASK [lae.proxmox : Create Ceph OSDs] ******************************************
failed: [pve-3] (item={'device': '/dev/vdb'}) => {
    "ansible_loop_var": "item",
    "changed": true,
    "cmd": [
        "ceph-volume",
        "--cluster",
        "ceph",
        "lvm",
        "create",
        "--bluestore",
        "--data",
        "/dev/vdb"
    ],
    "delta": "0:00:00.648927",
    "end": "2021-10-17 06:16:02.012395",
    "item": {
        "device": "/dev/vdb"
    },
    "rc": 1,
    "start": "2021-10-17 06:16:01.363468"
}

STDERR:

Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 4220da61-b282-4c75-8a15-32c976143d4f
 stderr: 2021-10-17T06:16:01.987+0000 7f0aa90c8700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
 stderr: 2021-10-17T06:16:01.987+0000 7f0aa90c8700 -1 AuthRegistry(0x7f0aa405b128) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: 2021-10-17T06:16:01.991+0000 7f0aa90c8700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
 stderr: 2021-10-17T06:16:01.991+0000 7f0aa90c8700 -1 AuthRegistry(0x7f0aa405b128) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2021-10-17T06:16:01.991+0000 7f0aa90c8700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
 stderr: 2021-10-17T06:16:01.991+0000 7f0aa90c8700 -1 AuthRegistry(0x7f0aa4060200) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2021-10-17T06:16:01.991+0000 7f0aa90c8700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
 stderr: 2021-10-17T06:16:01.991+0000 7f0aa90c8700 -1 AuthRegistry(0x7f0aa90c70e0) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: [errno 2] RADOS object not found (error connecting to the cluster)
-->  RuntimeError: Unable to create a new OSD id

@lae
Copy link
Owner

lae commented Oct 17, 2021

So the change to the Create Ceph OSDs task is not an equivalent replacement. The pveceph tool does a lot more than just run ceph-volume create, such as initializing all the authentication keys (including the one that is missing as above):

root@pve-2:~# pveceph osd create /dev/vdb
create OSD on /dev/vdb (bluestore)
wiping block device /dev/vdb
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.703624 s, 298 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 10ad546f-28e0-4997-b9e5-f3443a18b046
Running command: /sbin/vgcreate --force --yes ceph-5394a541-9be4-43af-8300-39ee776a1adc /dev/vdb
 stdout: Physical volume "/dev/vdb" successfully created.
 stdout: Volume group "ceph-5394a541-9be4-43af-8300-39ee776a1adc" successfully created
Running command: /sbin/lvcreate --yes -l 511 -n osd-block-10ad546f-28e0-4997-b9e5-f3443a18b046 ceph-5394a541-9be4-43af-8300-39ee776a1adc
 stdout: Logical volume "osd-block-10ad546f-28e0-4997-b9e5-f3443a18b046" created.
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1
--> Executable selinuxenabled not in PATH: /sbin:/bin:/usr/sbin:/usr/bin
Running command: /bin/chown -h ceph:ceph /dev/ceph-5394a541-9be4-43af-8300-39ee776a1adc/osd-block-10ad546f-28e0-4997-b9e5-f3443a18b046
Running command: /bin/chown -R ceph:ceph /dev/dm-0
Running command: /bin/ln -s /dev/ceph-5394a541-9be4-43af-8300-39ee776a1adc/osd-block-10ad546f-28e0-4997-b9e5-f3443a18b046 /var/lib/ceph/osd/ceph-1/block
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-1/activate.monmap
 stderr: 2021-10-17T06:50:44.714+0000 7f9102745700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2021-10-17T06:50:44.714+0000 7f9102745700 -1 AuthRegistry(0x7f90fc05b128) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: got monmap epoch 3
Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-1/keyring --create-keyring --name osd.1 --add-key AQDDx2thkFDPNBAAHT6iWt66ABUpTaS0icO8vg==
 stdout: creating /var/lib/ceph/osd/ceph-1/keyring
added entity osd.1 auth(key=AQDDx2thkFDPNBAAHT6iWt66ABUpTaS0icO8vg==)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 1 --monmap /var/lib/ceph/osd/ceph-1/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-1/ --osd-uuid 10ad546f-28e0-4997-b9e5-f3443a18b046 --setuser ceph --setgroup ceph
 stderr: 2021-10-17T06:50:45.010+0000 7f3eab65cf00 -1 bluestore(/var/lib/ceph/osd/ceph-1/) _read_fsid unparsable uuid
--> ceph-volume lvm prepare successful for: /dev/vdb
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1
Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-5394a541-9be4-43af-8300-39ee776a1adc/osd-block-10ad546f-28e0-4997-b9e5-f3443a18b046 --path /var/lib/ceph/osd/ceph-1 --no-mon-config
Running command: /bin/ln -snf /dev/ceph-5394a541-9be4-43af-8300-39ee776a1adc/osd-block-10ad546f-28e0-4997-b9e5-f3443a18b046 /var/lib/ceph/osd/ceph-1/block
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block
Running command: /bin/chown -R ceph:ceph /dev/dm-0
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1
Running command: /bin/systemctl enable ceph-volume@lvm-1-10ad546f-28e0-4997-b9e5-f3443a18b046
 stderr: Created symlink /etc/systemd/system/multi-user.target.wants/[email protected] -> /lib/systemd/system/[email protected].
Running command: /bin/systemctl enable --runtime ceph-osd@1
 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/[email protected] -> /lib/systemd/system/[email protected].
Running command: /bin/systemctl start ceph-osd@1
--> ceph-volume lvm activate successful for osd ID: 1
--> ceph-volume lvm create successful for: /dev/vdb

The ceph_volume.py module is imported from the ceph-ansible repository as-is (it's an older version so that's why it looks really different) and we're really only using it to simplify testing for idempotency. So I think it should be fine to leave the task like it was previously (using command module), otherwise we'd have to do a lot more to replicate the above behaviour. I'll test that in a bit.

@zenntrix
Copy link
Collaborator Author

Right ok, i shall revert to the 'pveceph osd create' way of doing it. Thats my fault for only testing that change on an already running PVE with new OSD's. All the keys would have already been there!

@lae
Copy link
Owner

lae commented Oct 17, 2021

Just confirming that reverting back to the original command task seems to work in finishing up the deployment (i.e. I made the change and re-ran vagrant provision instead of destroying the VMs and starting over).

It sounds like you're going to push that change here? If so I'll wait. After that I'm going to attempt to rebase and cleanup the commits into smaller ones and then do a final test and merge if that works.

@lae lae merged commit d024979 into develop Oct 19, 2021
@lae
Copy link
Owner

lae commented Oct 19, 2021

Merged. Thanks all!

@lae lae deleted the feature/pve7 branch October 19, 2021 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PVE 7 support
4 participants