Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network binding plugin: Support compute container resource overhead #303

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 115 additions & 1 deletion design-proposals/network-binding-plugin/network-binding-plugin.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,56 @@ pod.
- Mount points
- Downward API

##### Compute Container Resource Overhead

For some plugins, an additional resource consumption can be expected from the virt-launcher pod compute container.
For example, there could be need to execute an additional binary in the compute container.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain what binary can be executed and how it gets there?
How can an external plugin rely on the fact that a specific binary is part of the virt-launcher image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example was derived from the passt use case, on which the passt binary is shipped as part of the virt-launcher compute image.

Another example could be that the plugin configures the domain in a way that causes the virt-stack to consume additional compute resources that KubeVirt cannot account for.

Since this binary has its own CPU and memory limits, they should be somehow accounted for.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we should speak of a specific functionality and not about arbitrary binaries.
virt-launcher should be well defined.
If we know what functionality we're enabling that requires additional overhead, kubevirt needs to know the resources it requires. Similar to other aspects accounted for in GetMemoryOverhead.

I would say that if we want an API for this it should be based on a functionality, not a binding.

Each resource consumption of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plugin can perform actions that affect KubeVirt in ways it cannot account for.
Thus the need to externally add resources to the compute container.

Another example could be increased resource consumption of the virt-stack resulting from using the plugin.

Suggested solution:

Additional API for compute container resource overhead:

The network binding plugin API in the KubeVirt CR could receive an additional input field to specify the resource requirements overhead for the compute container:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a need to explicitly specify if this overhead is added per plugin type or per plugin usage count (i.e. per the number of interfaces referencing it from the VM).

It is also important to specify if it is dependent on any other field/setup.
The previous sidecar resource are dependent on the existence of a sidecar, but this one may not have such a dependency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


```yaml
apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
name: kubevirt
namespace: kubevirt
spec:
configuration:
network:
binding:
mynetbindingplugin:
sidecarImage: quay.io/kubevirt/mynetbindingplugin
computeResourceOverhead:
requests:
memory: 200Mi
```

If specified, KubeVirt will add the resource overhead specified in `computeResourceOverhead` to the compute container of the virt-launcher pod.
The specified resource overhead will be added to the compute container per unique usage of the plugin (not per NIC using the binding).

For example, assuming there is a plugin registered with a 200Mi memory overhead for the `compute` container, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does passt require mempry overhead per interface or based on whether its being executed or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is enough for the interface to be defined for the passt binary to run and consume the memory specified.

It is not clear what you mean by executed.

there are two interfaces using it.
`virt-controller` will only add 200Mi of memory to the `compute` container.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this 200Mi coming from? In the example above you used 20Mi.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a typo.
Fixed it in the example.


Pros:
- Cluster-wide definition of compute resource overhead per plugin.
- Additional resources could be requested other than CPU and memory.
- The resource overhead specification is visible to cluster admins.

Cons:
orelmisan marked this conversation as resolved.
Show resolved Hide resolved
orelmisan marked this conversation as resolved.
Show resolved Hide resolved
- Requires an API change.
- When upgrading KubeVirt / network binding plugin versions, the compute resource overhea specification might require adjustments.

This solution was selected since it provides the cluster admin more control in regard to resource allocation.

For the alternative solutions please see [Appendix G](#appendix-g-alternatives-to-compute-container-resource-overhead-specification)

#### Configure Pod netns

The CNI plugin has privileged access to the pod network namespace and
Expand Down Expand Up @@ -1162,4 +1212,68 @@ Libvirt configuration snippet:
</interface>
```

- The domain itself is created by the `virt-launcher`.
- The domain itself is created by the `virt-launcher`.

# Appendix G: Alternatives to compute container resource overhead specification

1. Manually setting the VM's resources:

Users could manually provide the additional memory overhead for the network binding plugin, under `spec.domain.resources.requests.memory`:

```yaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: vm-cirros
spec:
template:
spec:
domain:
memory:
guest: 128Mi
resources:
requests:
memory: 640Mi # 128Mi for the guest + 512Mi for the network binding plugin
```

KubeVirt will create a virt-launcher pod's compute container with a memory request containing the following:
- Guest's memory
- Memory overhead for KubeVirt's components (calculated by virt-controller)
- Memory overhead for the network binding plugin

Pros:
- Already implemented.

Cons:
- Error prune
- Defined per VM and not cluster-wide.
- Exposes the VM owner to unnecessary implementation details

2. Mutating webhook
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it another webook in addition to one that address the sidecar resources one?

It seem that it lacks description about what it"ll do, pros, cons section and how its going to integrate with kubevirt? ( I recall something about adding some label to launcher pods that use network binding plugins)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added additional details, same as for the sidecar.


For each unique network binding plugin used, the VMI controller will add a label on the virt-launcher pod with the following format:

`kubevirt.io/network-binding-plugin:<plugin-name>`

The binding plugin authors will provide a [mutating webhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook) that will intercept
virt-launcher pod creation that have the above label, and add the appropriate resources requests/limits
for the pod's `compute` container.

The mutating webhook will be able to identify the plugin's compute container by its name (`compute`) or using the value of
the `kubectl.kubernetes.io/default-container` annotation.

Pros:
- Plugin authors have full control over the compute resources
- Additional API is not added to KubeVirt.
- Opens the door for additional changes to the virt-launcher pod without changes to KubeVirt.
- Code changes in KubeVirt are very small.

Cons:
- Plugin authors should provide another component and integrate it.
- Additional point of failure.
- Requires maintaining certificates for the webhook.
- Additional latency when creating VMs with network binding plugins.
- The additional resource specification is less visible to cluster admins.
- Resource specification could collide with the support container resources specified on the KubeVirt CR or other webhooks.

The requirement to maintain certificates for the webhook could be mitigated using tools such as [cert-manager](https://cert-manager.io/).