From 550a5ce25ae4bf6d5c81c728d533f0773e9c07bf Mon Sep 17 00:00:00 2001 From: Prasanna Kumar Kalever Date: Tue, 8 Feb 2022 15:54:04 +0530 Subject: [PATCH] Add new Healer operation Healer Operation is used both for healing the volume and checking for health of the volume, it is upto the CSI driver whether to implement the healer mechanism or just live with health checks. Signed-off-by: Prasanna Kumar Kalever --- healer/README.md | 130 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 130 insertions(+) create mode 100644 healer/README.md diff --git a/healer/README.md b/healer/README.md new file mode 100644 index 0000000..2f9056c --- /dev/null +++ b/healer/README.md @@ -0,0 +1,130 @@ +# CSI-Addons Operation: Healer + +## Terminology + +| Term | Definition | +| -------- | ------------------------------------------------------------------------------------- | +| VolumeID | The identifier of the volume generated by the plugin. | +| CO | Container Orchestration system that communicates with plugins using CSI service RPCs. | +| SP | Storage Provider, the vendor of a CSI plugin implementation. | +| RPC | [Remote Procedure Call](https://en.wikipedia.org/wiki/Remote_procedure_call). | + +## Objective + +Define a standard that will enable storage providers (SP) to +perform node level volume health check and healing operations. + +### Goals in MVP + +The new extension will define a procedure that + +* can be called for existing volumes +* interacts with the Node-Plugin to check the health condition of the volume +* makes it possible for the SP to heal the volumes if they are in abnormal + condition + +### Non-Goals in MVP + +* Implementation of healing logic is OPTIONAL and completely SP specific + +## Solution Overview + +This specification defines an interface along with the minimum operational and +packaging recommendations for a storage provider (SP) to implement a +health check and heal operations for volumes. The interface declares the +RPCs that a plugin MUST expose. + +## RPC Interface + +* **Node Service**: The Node plugin MUST implement this RPC. + +```protobuf +syntax = "proto3"; +package healer; + +import "github.com/container-storage-interface/spec/lib/go/csi/csi.proto"; +import "google/protobuf/descriptor.proto"; + +option go_package = "github.com/csi-addons/spec/lib/go/healer"; + +// HealerNode holds the RPC method for running heal operations on the +// active (staged/published) volume. +service HealerNode { + // NodeHealer is a procedure that gets called on the CSI NodePlugin. + rpc NodeHealer (NodeHealerRequest) + returns (NodeHealerResponse) {} +} +``` + +### NodeHealer + +```protobuf +// NodeHealerRequest contains the information needed to identify the +// location where the volume is mounted so that local filesystem or +// block-device operations to heal volume can be executed. +message NodeHealerRequest { + // The ID of the volume. This field is REQUIRED. + string volume_id = 1; + + // The path on which volume is available. This field is REQUIRED. + // This field overrides the general CSI size limit. + // SP SHOULD support the maximum path length allowed by the operating + // system/filesystem, but, at a minimum, SP MUST accept a max path + // length of at least 128 bytes. + string volume_path = 2; + + // The path where the volume is staged, if the plugin has the + // STAGE_UNSTAGE_VOLUME capability, otherwise empty. + // If not empty, it MUST be an absolute path in the root + // filesystem of the process serving this request. + // This field is OPTIONAL. + // This field overrides the general CSI size limit. + // SP SHOULD support the maximum path length allowed by the operating + // system/filesystem, but, at a minimum, SP MUST accept a max path + // length of at least 128 bytes. + string staging_target_path = 3; + + // Volume capability describing how the CO intends to use this volume. + // This allows SP to determine if volume is being used as a block + // device or mounted file system. For example - if volume is being + // used as a block device the SP MAY choose to skip calling filesystem + // operations to healer. If volume_capability is omitted the SP MAY + // determine access_type from given volume_path for the volume and + // perform healing. This is an OPTIONAL field. + csi.v1.VolumeCapability volume_capability = 4; + + // Secrets required by plugin to complete the healer operation. + // This field is OPTIONAL. + map secrets = 5 [(csi.v1.csi_secret) = true]; + + // Volume context as returned by SP in + // CreateVolumeResponse.Volume.volume_context. + // This field is OPTIONAL and MUST match the volume_context of the + // volume identified by `volume_id`. + map volume_context = 6; +} + +// NodeHealerResponse holds the information about the result of the +// NodeHealerRequest call. +message NodeHealerResponse { + // Normal volumes are available for use and operating optimally. + // An abnormal volume does not meet these criteria. + // This field is REQUIRED. + bool abnormal = 1; + + // The message describing the condition of the volume. + // This field is REQUIRED. + string message = 2; +} +``` + +#### NodeHealer Errors + +| Condition | gRPC Code | Description | Recovery Behavior | +| ---------------------------- | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Missing required field | 3 INVALID_ARGUMENT | Indicates that a required field is missing from the request. | Caller MUST fix the request by adding the missing required field before retrying. | +| Volume does not exist | 5 NOT_FOUND | Indicates that a volume corresponding to the specified `volume_id` does not exist. | Caller MUST verify that the `volume_id` is correct and that the volume is accessible and has not been deleted before retrying with exponential back off. | +| Call not implemented | 12 UNIMPLEMENTED | The invoked RPC is not implemented by the CSI-driver or disabled in the driver's current mode of operation. | Caller MUST NOT retry. | +| Operation pending for volume | 10 ABORTED | Indicates that there is already an operation pending for the specified `volume_id`. In general the CSI-Addons CO plugin is responsible for ensuring that there is no more than one call "in-flight" per `volume_id` at a given time. However, in some circumstances, the CSI-Addons CO plugin MAY lose state (for example when the it crashes and restarts), and MAY issue multiple calls simultaneously for the same `volume_id`. The CSI-driver, SHOULD handle this as gracefully as possible, and MAY return this error code to reject secondary calls. | Caller SHOULD ensure that there are no other calls pending for the specified `volume_id`, and then retry with exponential back off. | +| Not authenticated | 16 UNAUTHENTICATED | The invoked RPC does not carry secrets that are valid for authentication. | Caller SHALL either fix the secrets provided in the RPC, or otherwise regalvanize said secrets such that they will pass authentication by the Plugin for the attempted RPC, after which point the caller MAY retry the attempted RPC. | +| Error is Unknown | 2 UNKNOWN | Indicates that a unknown error is generated | Caller MUST study the logs before retrying |