diff --git a/docs/proposals/distributedstorage/distributedstorage.md b/docs/proposals/distributedstorage/distributedstorage.md index e69de29bb..edf259b46 100644 --- a/docs/proposals/distributedstorage/distributedstorage.md +++ b/docs/proposals/distributedstorage/distributedstorage.md @@ -0,0 +1,585 @@ +--- +title: Distributed Storage System for Kurator +authors: +- "@LiZhenCheng9527" # Authors' GitHub accounts here. +reviewers: +approvers: + +creation-date: 2023-09-08 + +--- + +## Distributed Storage System for Kurator + + + +### Summary + + +Kurator, as an open-source distributed cloud-native platform, has been pivotal in aiding users to construct their distributed cloud-native infrastructure, thereby facilitating enterprise digital transformation. + +In order to further enhance its functionality, this proposal introduces a unified solution for distributed storage across multiple clsuter in `Fleet` throught a seamless one-click operation. + +By integrating rook, we aim to provide users with reliable, fast and unified distributed storage, enabling them to easily use block, file and object storage in multiple clusters. + +### Motivation + + + +In the current era of data explosion, distributed cloud storage has the advantages of easy scalability, high concurrency, reliability, high availability and high storage efficiency. + +Kurator is Huawei Cloud's open source distributed cloud native suite, providing users with a one-stop open source solution for distributed cloud native scenarios. So distributed storage as an important part of cloud native usage scenarios, kurator needs to provide relevant functional support. + +#### Goals + + +The unified distributed cloud storage only require users to declare the desired API state in one place, and Kurator will automatically handle all subsequent operations, including the installation of Rook on the spectific clusters, execution of specific operations, and unified aggregation of the status of each operation. + +- **unified distributed cloud storage** + - Supports three types of storage: Block storage, Filesystem storage and Object storage. + - Support for scoping different storage types by name, label, and Annotation. + - Support for applying these policies to multiple clusters or a single cluster. + +#### Non-Goals +- **support for another distributed storage system** Rook only supports ceph distributed storage system, so it does not support other distributed storage systems. +- **support for Erasure Code** Erasure Code is a data redundancy protection technique. The ability to recover lost data within certain limits is a more cost-effective way to improve the reliability of storage systems than the three-copy approach. However, Kurator now only supports the three-copy method. It may add related features in the future! + + + +### Proposal + + +This proposal aims to introduce unified distributed cloud storage for Kurator that supports block, file and object storage. The main objectives of this proposal are as follows: + +Custom Resource Definitions (CRDs): Design CRDs to enable unified distributed cloud storage capabilities. These CRDs will provide a structured approach for defining clusters, different storage types for implementing distributed cloud storage. + +Cluster Manager Implementation: The Cluster Manager component will be responsible for monitoring the CRDs and performing the defined functions. It will install Rooks on the cluster clusters and handle potential errors or anomalies to ensure smooth operation. + +By integrating these enhancements, Kurator will provide users with a powerful yet streamlined solution for managing the task of implementing distributed cloud storage and simplifying the overall operational process. + +#### User Stories (Optional) + + + +##### Story 1 + +**User Role**: Operator managing multi-cluster Kubernetes environments + +**Feature**: With the enhanced Kurator, Operators can easily configure distributed storage policies for multiple clusters simultaneously. + +**Value**: Provides a simplified, automated way to unify the management of distributed storage across multiple clusters. Reduces human error and ensures data continuity and compliance. + +**Outcome**: Using this feature, Operator can easily configure distributed storage for all clusters to improve the reliability, availability and storage efficiency of business system storage, as well as easy scalability + +##### Story 2 + +**User Role**: Operator managing many different service which need different storage type + +**Feature**: Operator have the ability to slap different labels on the nodes where different services reside in Kubernetes. With the enhanced Kurator, based on these labels, Operator are able to specify the appropriate storage type for the corresponding service easily. + +**Outcome**: Using this feature, Operators can provide different types of storage for different services in all clusters to improve business system storage efficiency and overall service performance. + +#### Notes/Constraints/Caveats (Optional) + + +**Resource Require** In order to configure the Ceph storage cluster, at least one of these local storage options are required: +- Raw devices (no partitions or formatted filesystems) +- Raw partitions (no formatted filesystem) +- LVM Logical Volumes (no formatted filesystem) +- Persistent Volumes available from a storage class in block mode + +#### Risks and Mitigations + + +**Version Compatibility** It is recommended that users use a newer version of Kubernetes, as older versions are not tested with Rook. Kubernetes v1.19 or higher is supported by Rook. +### Design Details + + + +In this section, we will dive into the detailed API designs for the Unified Distributed Cloud Storage feature. + +These APIs are designed to facilitate Kurator's integration with Rook to enable the required functionality. + +In contrast to Rook, we may need to adapt the Unified Distributed Cloud Storage to reflect our new strategy and decisions. + +##### Unified Distributed Storage System API + +Here's the preliminary design for the Unified Distributed Storage System API: + +```console +// is the DistributedStorage schema for the DistributedStorage's API. +type DistributedStorage struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata"` + Spec DistributedStorageSpec `json:"spec"` + + // ClusterStatus is the current status of the distributed storage performed within this cluster + // +kubebuilder:pruning:PreserveUnknownFields + // +optional + // +nullable + ClusterStatus rookv1.ClusterStatus `json:"status,omitempty"` +} + +type DistributedStorageSpec struct { + // ClusterName is the Name of the cluster where the storage is being performed + // +optional + ClusterName string `json:"clusterName,omitempty"` + + // ClusterKind is the kind of ClusterName recorded on Kurator + // +optional + ClusterKind string `json:"clusterKind,omitempty"` + + // ClusterSpec indicates the state that the cluster wants to be in + // +optional + ClusterSpec StorageClusterSpec `json:"spec"` +} + +// Note: partly copied from https://github.com/rook/rook/blob/release-1.10/pkg/apis/ceph.rook.io/v1/types.go +type StorageClusterSpec struct { + // The path on the host where config and data can be persisted. Must be set. + // If the storagecluster is deleted, please clean up the configuration files in this file path. + // +kubebuilder:validation:Pattern=`^/(\S+)` + // +optional + DataDirHostPath string `json:"dataDirHostPath,omitempty"` + + // The annotations-related configuration to add/set on each Pod related object. + // +nullable + // +optional + Annotations rookv1.AnnotationsSpec `json:"annotations,omitempty"` + + // The labels-related configuration to add/set on each Pod related object. + // +kubebuilder:pruning:PreserveUnknownFields + // +nullable + // +optional + Labels rookv1.LabelsSpec `json:"labels,omitempty"` + + // Mon is the daemon for the state of node data mapping + // A spec for mon related options + // +optional + // +nullable + Mon MonSpec `json:"mon,omitempty"` + + // mgr is responsible for connect between the node and operator + // A spec for mgr related options + // +optional + // +nullable + Mgr MgrSpec `json:"mgr,omitempty"` + + // The placement-related configuration to pass to kubernetes (affinity, node selector, tolerations). + // +kubebuilder:pruning:PreserveUnknownFields + // +nullable + // +optional + Placement PlacementSpec `json:"placement,omitempty"` + + // A spec for available storage in the cluster and how it should be used + // +optional + // +nullable + Storage StorageScopeSpec `json:"storage,omitempty"` +} +``` + +Here is the details about `Mon` and `Mgr` + + +```console +// Note: partly copied from https://github.com/rook/rook/blob/release-1.10/pkg/apis/ceph.rook.io/v1/types.go + +type MonSpec struct { + // Count is the number of Ceph monitors + // Default is three and preferably an odd number + // +kubebuilder:validation:Minimum=0 + // +kubebuilder:validation:Maximum=9 + // +optional + Count int `json:"count,omitempty"` +} + +type MgrSpec struct { + // Count is the number of manager to run + // Default is two, one for use and one for standby. + // +kubebuilder:validation:Minimum=0 + // +kubebuilder:validation:Maximum=2 + // +optional + Count int `json:"count,omitempty"` + + // Modules is the list of ceph manager modules to enable/disable + // +optional + // +nullable + Modules []rookv1.Module `json:"modules,omitempty"` +} +``` +Here is the details about `PlacementSpec` + +```console +// Note: partly copied from https://github.com/rook/rook/blob/release-1.10/pkg/apis/ceph.rook.io/v1/types.go + +// PlacementSpec is the placement for core ceph daemons part of the CephCluster CRD +type PlacementSpec map[KeyType]Placement +// Placement is the placement for an object +type Placement struct { + // NodeAffinity is a group of node affinity scheduling rules + // +optional + NodeAffinity *corev1.NodeAffinity `json:"nodeAffinity,omitempty"` + + // PodAffinity is a group of inter pod affinity scheduling rules + // +optional + PodAffinity *corev1.PodAffinity `json:"podAffinity,omitempty"` + + // PodAntiAffinity is a group of inter pod anti affinity scheduling rules + // +optional + PodAntiAffinity *corev1.PodAntiAffinity `json:"podAntiAffinity,omitempty"` + + // The pod this Toleration is attached to tolerates any taint that matches + // the triple using the matching operator + // +optional + Tolerations []corev1.Toleration `json:"tolerations,omitempty"` + + // TopologySpreadConstraint specifies how to spread matching pods among the given topology + // +optional + TopologySpreadConstraints []corev1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"` +} +``` + +Here is the details about `StorageScopeSpec` + +```console +// Note: partly copied from https://github.com/rook/rook/blob/release-1.10/pkg/apis/ceph.rook.io/v1/types.go +type StorageScopeSpec struct { + // +nullable + // +optional + Nodes []Node `json:"nodes,omitempty"` + + // +optional + UseAllNodes bool `json:"useAllNodes,omitempty"` + + // +optional + OnlyApplyOSDPlacement bool `json:"onlyApplyOSDPlacement,omitempty"` + + Selection `json:",inline"` + + // +nullable + // +optional + StorageClassDeviceSets []StorageClassDeviceSet `json:"storageClassDeviceSets,omitempty"` + + // OSDStore is the backend storage type used for creating the OSDs + // Default OSDStore type is bluestore which can directly manages bare devices + // +optional + Store rookv1.OSDStore `json:"store,omitempty"` +} + +// Each individual node can specify configuration to override the cluster level settings and defaults. If a node does not specify any configuration then it will inherit the cluster level settings. +type Node struct { + // Name should match its kubernetes.io/hostname label + // +optional + Name string `json:"name,omitempty"` + + // The CPU and RAM requests/limits for the devices. + // +kubebuilder:pruning:PreserveUnknownFields + // +nullable + // +optional + Resources corev1.ResourceRequirements `json:"resources,omitempty"` + + // +kubebuilder:pruning:PreserveUnknownFields + // +nullable + // +optional + Selection `json:",inline"` +} + +// Below are the settings for host-based cluster. This type of cluster can specify devices for OSDs, both at the cluster and individual node level, for selecting which storage resources will be included in the cluster. +type Selection struct { + // Whether to consume all the storage devices found on a machine + // indicating whether all devices found on nodes in the cluster should be automatically consumed by OSDs. Not recommended unless you have a very controlled environment where you will not risk formatting of devices with existing data. + // +optional + UseAllDevices *bool `json:"useAllDevices,omitempty"` + + // A regular expression to allow more fine-grained selection of devices on nodes across the cluster + // selection of devices and partitions to be consumed by OSDs + // This field uses golang regular expression syntax. + // e.g. sdb/^sd./^sd[a-d] + // +optional + DeviceFilter string `json:"deviceFilter,omitempty"` + + // A regular expression to allow more fine-grained selection of devices with path names + // Device paths that allow selection of devices and partitions to be consumed by OSDs + // +optional + DevicePathFilter string `json:"devicePathFilter,omitempty"` + + // List of devices to use as storage devices + // A list of individual device names belonging to this node to include in the storage cluster + // e.g. `sda` or `/dev/disk/by-id/ata-XXXX` + // +kubebuilder:pruning:PreserveUnknownFields + // +nullable + // +optional + Devices []rookv1.Device `json:"devices,omitempty"` +} +``` + +After determining which clusters to implement distributed storage in and the details of the drivers implemented on the nodes in the cluster. It is also necessary to specify the type of storage implemented on the nodes in those clusters. + +In a ceph cluster, mon, mgr, osd, rgw, and mds species daemon are needed to support distributed storage. Rgw is rados gateway, a daemon needed for object storage. Mda is metadata server, a daemon needed for filesystem storage. + +Whether it's block storage, file storage or object storage, all need a pool. Pool are logical partitions of Ceph storage data such as namespace in kubernetes. Here is the details about Pool. + +```console +// Note: partly copied from https://github.com/rook/rook/blob/release-1.10/pkg/apis/ceph.rook.io/v1/types.go +// PoolSpec represents the spec of ceph pool +type PoolSpec struct { + // The failure domain: osd/host/(region or zone if available) - technically also any type in the crush map + // The rook is automatically placed in different failure domains based on the node's Region and AZ, and does not need to be adjusted manually. + // +optional + FailureDomain string `json:"failureDomain,omitempty"` + + // Number of copies per object in a replicated storage pool, including the object itself (required for replicated pool type) + // +kubebuilder:validation:Minimum=0 + Replicated uint `json:"replicated,omitempty"` +} +``` + +Using block storage in a Ceph cluster requires a cephblockpool. Here is the detail about `CephBlockPool`. + +```console +// Note: partly copied from https://github.com/rook/rook/blob/release-1.10/pkg/apis/ceph.rook.io/v1/types.go +// CephBlockPool represents a Ceph Storage Pool +// +kubebuilder:printcolumn:name="Phase",type=string,JSONPath=`.status.phase` +// +kubebuilder:subresource:status +type CephBlockPool struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata"` + Spec BlockPoolSpec `json:"spec"` + // +kubebuilder:pruning:PreserveUnknownFields + Status *rookv1.CephBlockPoolStatus `json:"status,omitempty"` +} + +// BlockPoolSpec allows a block pool to be created with a non-default name. +type BlockPoolSpec struct { + // The desired name of the pool if different from the CephBlockPool CR name. + // +kubebuilder:validation:Enum=device_health_metrics;.nfs;.mgr + // +optional + Name string `json:"name,omitempty"` + // The core pool configuration + PoolSpec `json:",inline"` +} +``` + +Using Filesystem storagr in a Ceph cluster require a Metadata server. Metadata server is a daemon in Ceph cluster. Here is th detailed about `CephFilesystem` and `MetadataServer`. + +```console +// Note: partly copied from https://github.com/rook/rook/blob/release-1.10/pkg/apis/ceph.rook.io/v1/types.go +// CephFilesystem represents a Ceph Filesystem +type CephFilesystem struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata"` + Spec FilesystemSpec `json:"spec"` + // +kubebuilder:pruning:PreserveUnknownFields + Status *rookv1.CephFilesystemStatus `json:"status,omitempty"` +} + +type FilesystemSpec struct { + // The metadata pool settings. Metadata is the information that stores the attributes, location, etc. of the data. + // +nullable + MetadataPool PoolSpec `json:"metadataPool"` + + // The data pool settings, with optional predefined pool name, is the pool that stores the real data. + // +nullable + DataPools []NamedPoolSpec `json:"dataPools"` + + // Whether or not the Filesystem retains information in the pool when it is deleted. The default is True + // +optional + PreservePoolsOnDelete bool `json:"preservePoolsOnDelete,omitempty"` + + // The mds pod info + MetadataServer MetadataServerSpec `json:"metadataServer"` +} + +// MetadataServerSpec represents the specification of a Ceph Metadata Server +type MetadataServerSpec struc { + // The number of metadata servers that are active. The remaining servers in the cluster will be in standby mode. The default is 1. + // +kubebuilder:validation:Minimum=1 + // +kubebuilder:validation:Maximum=10 + ActiveCount int32 `json:"activeCount"` + + // Whether each active MDS instance will have an active standby with a warm metadata cache for faster failover. + // If false, standbys will still be available, but will not have a warm metadata cache. + // +optional + ActiveStandby bool `json:"activeStandby,omitempty"` + + // The affinity to place the mds pods (default is to place on all available node) with a daemonset + // +kubebuilder:pruning:PreserveUnknownFields + // +nullable + // +optional + Placement Placement `json:"placement,omitempty"` + + // The annotations-related configuration to add/set on each Pod related object. + // +kubebuilder:pruning:PreserveUnknownFields + // +nullable + // +optional + Annotations Annotations `json:"annotations,omitempty"` + + // The labels-related configuration to add/set on each Pod related object. + // +kubebuilder:pruning:PreserveUnknownFields + // +nullable + // +optional + Labels Labels `json:"labels,omitempty"` +} +``` + +Ceph Object Storage via RGW. RGW is rados gateway, a daemon needed for object storage in Ceph cluster. Here is th detailed about `ObjectStore` and `GatewaySpec`. + +```console +// Note: partly copied from https://github.com/rook/rook/blob/release-1.10/pkg/apis/ceph.rook.io/v1/types.go +// ObjectStore represents a Ceph Object Store Gateway in Kurator +type ObjectStore struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata"` + Spec ObjectStoreSpec `json:"spec"` + // +kubebuilder:pruning:PreserveUnknownFields + Status *rookv1.ObjectStoreStatus `json:"status,omitempty"` +} + +type ObjectStoreSpec struct { + // The metadata pool settings. Metadata is the information that stores the attributes, location, etc. of the data. + // +optional + // +nullable + MetadataPool PoolSpec `json:"metadataPool,omitempty"` + + // The data pool settings, with optional predefined pool name, is the pool that stores the real data. + // +optional + // +nullable + DataPool PoolSpec `json:"dataPool,omitempty"` + + // Whether or not the Object storage retains information in the pool when it is deleted. The default is True + // +optional + PreservePoolsOnDelete bool `json:"preservePoolsOnDelete,omitempty"` + + // The rgw pod info + // +optional + // +nullable + Gateway GatewaySpec `json:"gateway"` +} + +type GatewaySpec struct { + // The port the rgw service will be listening on (http) + // +optional + Port int32 `json:"port,omitempty"` + + // The port the rgw service will be listening on (https) + // +kubebuilder:validation:Minimum=0 + // +kubebuilder:validation:Maximum=65535 + // +nullable + // +optional + SecurePort int32 `json:"securePort,omitempty"` + + // The number of pods in the rgw replicaset. + // +nullable + // +optional + Instances int32 `json:"instances,omitempty"` + + // The name of the secret that stores the ssl certificate for secure rgw connections + // +nullable + // +optional + SSLCertificateRef string `json:"sslCertificateRef,omitempty"` + + // The affinity to place the rgw pods (default is to place on any available node) + // +kubebuilder:pruning:PreserveUnknownFields + // +nullable + // +optional + Placement Placement `json:"placement,omitempty"` + + // The annotations-related configuration to add/set on each Pod related object. + // +kubebuilder:pruning:PreserveUnknownFields + // +nullable + // +optional + Annotations Annotations `json:"annotations,omitempty"` + + // The labels-related configuration to add/set on each Pod related object. + // +kubebuilder:pruning:PreserveUnknownFields + // +nullable + // +optional + Labels Labels `json:"labels,omitempty"` +} + ``` + + +#### Test Plan + + + +End-to-End Tests: Comprehensive E2E testing should be performed to ensure that block, file, and object storage in distributed storage operate seamlessly across clusters. + +Integration Tests: Integration tests should be designed to ensure Kurator's integration with Rook functions as expected. + +Unit Tests: Unit tests should cover the core functionalities and edge cases. + +Isolation Testing: The distributed storage functionalities should be tested in isolation and in conjunction with other components to ensure compatibility and performance. +### Alternatives + + + + + +The main alternatives considered were other distributed storage systems. For example, openebs or longhorn. but if you want to use ceph, you have to use a Rook. Considering that Ceph is more advanced than other storage systems due to its decentralised and CRUSH nature. Therefore, the Rook should be used in Kurator. \ No newline at end of file