Skip to content

Commit

Permalink
Add new Healer operation
Browse files Browse the repository at this point in the history
Healer Operation is used both for healing the volume and checking for
health of the volume, it is upto the CSI driver whether to implement the
healer mechanism or just live with health checks.

Signed-off-by: Prasanna Kumar Kalever <[email protected]>
  • Loading branch information
Prasanna Kumar Kalever committed Mar 15, 2022
1 parent 80313ad commit 550a5ce
Showing 1 changed file with 130 additions and 0 deletions.
130 changes: 130 additions & 0 deletions healer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# CSI-Addons Operation: Healer

## Terminology

| Term | Definition |
| -------- | ------------------------------------------------------------------------------------- |
| VolumeID | The identifier of the volume generated by the plugin. |
| CO | Container Orchestration system that communicates with plugins using CSI service RPCs. |
| SP | Storage Provider, the vendor of a CSI plugin implementation. |
| RPC | [Remote Procedure Call](https://en.wikipedia.org/wiki/Remote_procedure_call). |

## Objective

Define a standard that will enable storage providers (SP) to
perform node level volume health check and healing operations.

### Goals in MVP

The new extension will define a procedure that

* can be called for existing volumes
* interacts with the Node-Plugin to check the health condition of the volume
* makes it possible for the SP to heal the volumes if they are in abnormal
condition

### Non-Goals in MVP

* Implementation of healing logic is OPTIONAL and completely SP specific

## Solution Overview

This specification defines an interface along with the minimum operational and
packaging recommendations for a storage provider (SP) to implement a
health check and heal operations for volumes. The interface declares the
RPCs that a plugin MUST expose.

## RPC Interface

* **Node Service**: The Node plugin MUST implement this RPC.

```protobuf
syntax = "proto3";
package healer;
import "github.com/container-storage-interface/spec/lib/go/csi/csi.proto";
import "google/protobuf/descriptor.proto";
option go_package = "github.com/csi-addons/spec/lib/go/healer";
// HealerNode holds the RPC method for running heal operations on the
// active (staged/published) volume.
service HealerNode {
// NodeHealer is a procedure that gets called on the CSI NodePlugin.
rpc NodeHealer (NodeHealerRequest)
returns (NodeHealerResponse) {}
}
```

### NodeHealer

```protobuf
// NodeHealerRequest contains the information needed to identify the
// location where the volume is mounted so that local filesystem or
// block-device operations to heal volume can be executed.
message NodeHealerRequest {
// The ID of the volume. This field is REQUIRED.
string volume_id = 1;
// The path on which volume is available. This field is REQUIRED.
// This field overrides the general CSI size limit.
// SP SHOULD support the maximum path length allowed by the operating
// system/filesystem, but, at a minimum, SP MUST accept a max path
// length of at least 128 bytes.
string volume_path = 2;
// The path where the volume is staged, if the plugin has the
// STAGE_UNSTAGE_VOLUME capability, otherwise empty.
// If not empty, it MUST be an absolute path in the root
// filesystem of the process serving this request.
// This field is OPTIONAL.
// This field overrides the general CSI size limit.
// SP SHOULD support the maximum path length allowed by the operating
// system/filesystem, but, at a minimum, SP MUST accept a max path
// length of at least 128 bytes.
string staging_target_path = 3;
// Volume capability describing how the CO intends to use this volume.
// This allows SP to determine if volume is being used as a block
// device or mounted file system. For example - if volume is being
// used as a block device the SP MAY choose to skip calling filesystem
// operations to healer. If volume_capability is omitted the SP MAY
// determine access_type from given volume_path for the volume and
// perform healing. This is an OPTIONAL field.
csi.v1.VolumeCapability volume_capability = 4;
// Secrets required by plugin to complete the healer operation.
// This field is OPTIONAL.
map<string, string> secrets = 5 [(csi.v1.csi_secret) = true];
// Volume context as returned by SP in
// CreateVolumeResponse.Volume.volume_context.
// This field is OPTIONAL and MUST match the volume_context of the
// volume identified by `volume_id`.
map<string, string> volume_context = 6;
}
// NodeHealerResponse holds the information about the result of the
// NodeHealerRequest call.
message NodeHealerResponse {
// Normal volumes are available for use and operating optimally.
// An abnormal volume does not meet these criteria.
// This field is REQUIRED.
bool abnormal = 1;
// The message describing the condition of the volume.
// This field is REQUIRED.
string message = 2;
}
```

#### NodeHealer Errors

| Condition | gRPC Code | Description | Recovery Behavior |
| ---------------------------- | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Missing required field | 3 INVALID_ARGUMENT | Indicates that a required field is missing from the request. | Caller MUST fix the request by adding the missing required field before retrying. |
| Volume does not exist | 5 NOT_FOUND | Indicates that a volume corresponding to the specified `volume_id` does not exist. | Caller MUST verify that the `volume_id` is correct and that the volume is accessible and has not been deleted before retrying with exponential back off. |
| Call not implemented | 12 UNIMPLEMENTED | The invoked RPC is not implemented by the CSI-driver or disabled in the driver's current mode of operation. | Caller MUST NOT retry. |
| Operation pending for volume | 10 ABORTED | Indicates that there is already an operation pending for the specified `volume_id`. In general the CSI-Addons CO plugin is responsible for ensuring that there is no more than one call "in-flight" per `volume_id` at a given time. However, in some circumstances, the CSI-Addons CO plugin MAY lose state (for example when the it crashes and restarts), and MAY issue multiple calls simultaneously for the same `volume_id`. The CSI-driver, SHOULD handle this as gracefully as possible, and MAY return this error code to reject secondary calls. | Caller SHOULD ensure that there are no other calls pending for the specified `volume_id`, and then retry with exponential back off. |
| Not authenticated | 16 UNAUTHENTICATED | The invoked RPC does not carry secrets that are valid for authentication. | Caller SHALL either fix the secrets provided in the RPC, or otherwise regalvanize said secrets such that they will pass authentication by the Plugin for the attempted RPC, after which point the caller MAY retry the attempted RPC. |
| Error is Unknown | 2 UNKNOWN | Indicates that a unknown error is generated | Caller MUST study the logs before retrying |

0 comments on commit 550a5ce

Please sign in to comment.