Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

design-proposal: Generating disk serial numbers #305

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

akrejcir
Copy link

@akrejcir akrejcir commented Jul 2, 2024

What this PR does / why we need it:
This design introduces automatic generation of serial numbers for disks.

Issue:
https://issues.redhat.com/browse/CNV-39082

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note:

Added proposal for disk serial number generation.

@kubevirt-bot kubevirt-bot added the dco-signoff: no Indicates the PR's author has not DCO signed all their commits. label Jul 2, 2024
@kubevirt-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vladikr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@akrejcir akrejcir force-pushed the disk-serial-proposal branch from 2ac6a4a to 0522553 Compare July 2, 2024 12:52
@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. and removed dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Jul 2, 2024
@akrejcir
Copy link
Author

akrejcir commented Jul 2, 2024

@aglitke , can someone from the storage team take a look, if this approach is good?

@akrejcir akrejcir force-pushed the disk-serial-proposal branch from 0522553 to 9aaec11 Compare July 4, 2024 15:13
design-proposals/generated-disk-serial.md Outdated Show resolved Hide resolved
design-proposals/generated-disk-serial.md Outdated Show resolved Hide resolved
## Motivation
Some applications running in the VM require the disk serial number to be defined and persistent across reboots.
Generating the serial reduces manual configuration, and in the future it can be part of a `VirtualMachinePreference` for the OS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reference (doc or implemention) how this is addressed in other virtualization management system, especially oVirt and OpenStack?

Copy link
Author

@akrejcir akrejcir Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check oVirt code.

@lyarwood , do you know how this is done in OpenStack? Or can you point me to docs or source, where I could find it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akrejcir we talked about introducing persistent serials for the longest time but could never implement it due to a mountain of tech debt we had in the codebase. The following design was as far as we got:

https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/local-disk-serial-numbers.html

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

oVirt uses the disk GUID as a serial number. It would be similar if we used the UID of a PVC.

design-proposals/generated-disk-serial.md Show resolved Hide resolved
design-proposals/generated-disk-serial.md Outdated Show resolved Hide resolved
design-proposals/generated-disk-serial.md Show resolved Hide resolved
design-proposals/generated-disk-serial.md Outdated Show resolved Hide resolved
design-proposals/generated-disk-serial.md Outdated Show resolved Hide resolved
design-proposals/generated-disk-serial.md Outdated Show resolved Hide resolved
design-proposals/generated-disk-serial.md Show resolved Hide resolved

One possibility is to store it in an annotation on the PVC object.
The annotation would be created when the VMI is created.
Then it will be read by `virt-handler` when creating libvirt domain XML.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just persist all serial numbers in the Status of the VirtualMachine?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the status may be emptied by a GitOps controller, when it creates a new version of the VM. I will check.

I will add this possibility to the document for discussion.

Copy link
Member

@lyarwood lyarwood Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be ignoring any updates of Status outside of the /Status subresource IIRC so I don't think a GitOps workflow would overwrite it here. That's why we are looking to move instance type metadata out of the Spec etc.

https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

The PUT and POST verbs on objects MUST ignore the status values, to avoid accidentally overwriting the status in read-modify-write scenarios. A /status subresource MUST be provided to enable system components to update statuses of resources they manage.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It could work. I will add it to the design document.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as for instancetypes. What if the status needs to be recreated? Where is the ID derived from? Can we recreate the same ID if the status needs to be recreated from scratch?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the main issue with this approach.

If we could deterministically generate the serial number, then we don't need to store it in .status.
In this case, the serial number is randomly generated, so it needs to be stored somewhere. If .status is recreated, then a new number is generated.


- Do we want the serials to be generated by default on all VMs,
or do we want a configuration option to enable it?
- Where is a good place for the `generateDiskSerials` field?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec.template.spec.domain.devices.generateDiskSerials of the VirtualMachine IMHO

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

## Motivation
Some applications running in the VM require the disk serial number to be defined and persistent across reboots.
Generating the serial reduces manual configuration, and in the future it can be part of a `VirtualMachinePreference` for the OS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akrejcir we talked about introducing persistent serials for the longest time but could never implement it due to a mountain of tech debt we had in the codebase. The following design was as far as we got:

https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/local-disk-serial-numbers.html

design-proposals/generated-disk-serial.md Show resolved Hide resolved
@akrejcir akrejcir force-pushed the disk-serial-proposal branch from 9aaec11 to 735359c Compare July 17, 2024 15:26
This design introduces automatic generation of serial
numbers for disks.

Signed-off-by: Andrej Krejcir <[email protected]>
@akrejcir akrejcir force-pushed the disk-serial-proposal branch from 735359c to d0d0377 Compare July 24, 2024 13:30

# Design

A new boolean option `generateDiskSerials` will be added to the VMI `.spec`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need an option for it? What's the default right now? Is it 123456789, which means nobody cares about it anyway? Can't we just enable it if the ID was not set manually?

Copy link
Author

@akrejcir akrejcir Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default right now is no serial number. The <serial> tag in the libvirt domain XML is not specified.

We could enable it by default for all disks. That is one of the open questions in this design.
I'm unsure if it may confuse guests on existing VMs, if the disk will have a serial that it didn't have before.


One possibility is to store it in an annotation on the PVC object.
The annotation would be created when the VMI is created.
Then it will be read by `virt-handler` when creating libvirt domain XML.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as for instancetypes. What if the status needs to be recreated? Where is the ID derived from? Can we recreate the same ID if the status needs to be recreated from scratch?



### Storing it in the `status` of the VM
Serial numbers will be stored in the `.status` of the VM.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to restore / transfer a VM with the serial number? IIUC you are not allowed to write to the status of a K8S object.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the code, and the .status of a VM is not stored into the snapshot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @0xFelix 's point was that we then need a way of persisting the serial in the core spec of a VirtualMachine during a snapshot so that we can then restore the status (and the state of the VM) later.

- It may be an unexpected place to store persistent information that cannot be regenerated if `.status` is wiped.


### Deterministic generation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach the most. Basically a stateless approach without configurables.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be the best choice, if it can be compatible with storage migration.

Then the rest of the document describes details of option: **Annotation on the PVC object**.

### Annotation on the PVC object
The serial number will be stored as an annotation on the PVC object. It would be created when the VMI is created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will require some component to have PVC write permissions.

so it has to be stored somewhere, or generated deterministically.

This section lists several options how to store serial numbers.
Then the rest of the document describes details of option: **Annotation on the PVC object**.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking the ID can be scoped in two ways:

  • VM/I Internal - serial is only defined/kept within the VM/I bject
  • VM/I external - serial is sourced from an external object - this is the case with the PVC annotation approach

If it's internal, then usually the requirements of a persistent serial number are met, however, it could lead to trouble if a volume is changed on a VMI, or if a volume is used on multiple VMIs.

If it is external, then we also meet the requirement of a persistent serial number, the same volume will get the same serial whenever attached to a VM that's good for vols attached to multi VMs, however, we need to think about cloning, and restore, and what it means for disk serial numbers. Something also requires permissionsto write the PVC.

Just to name a few things.

- Serial is clearly visible to anyone who can read PVC.

#### Disadvantages:
- Does not store serials for other volume types except PVC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requires additional permissions to write PVCs

- We don't need to store anything.

#### Disadvantages:
- If the storage is migrated to a PVC with different metadata, it would change the serial number.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we do not "migrate" PVCs (aka modify them in-place).
Instead new PVCs are created. @alicefr @awels what does storage class migration do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage class migration requires one to create new PVCs as the target of the migration. So there will be new PVCs. PVCs themselves are mostly immutable, in particular the name is immutable.

metadata:
name: example-pvc
annotations:
kubevirt.io/disk-serial: 123456789 # <-- new annotation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use vm.kubevirt.io/disk-serial

cc @EdDev @alaypatel07 can we have api conventions? Possibly worth to standardize how we prefix annotations accross kubevirt.


## Update/Rollback Compatibility
The feature is backward compatible. Existing VMs remain unaffected unless the new field is set.
Rollback ignores the new field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check todays state:

If a serial is not specified, between VM start, stop, start, does the serial of the disk inside the VM change?

If it is the same serial, then we might need a migraiton plan.

@kubevirt-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 18, 2024
@akrejcir
Copy link
Author

/remove-lifecycle stale

@kubevirt-bot kubevirt-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 18, 2024
@xpivarc
Copy link
Member

xpivarc commented Nov 18, 2024

/sig compute
/sig storage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. sig/compute sig/storage size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants