-
Notifications
You must be signed in to change notification settings - Fork 0
Using ECS as Tier 2
Since 0.7 version, Pravega now fully supports DellEMC ECS object storage (ECS for short) as Tier 2 using the open-source Extended S3 Client. This is in addition to other Tier 2 options Pravega supports today (HDFS and Filesystem (NFS)).
Unlike file systems which manages data in a hierarchy of files and directories or unlike block oriented block storage, Object storage organizes data as independent objects with globally unique identifiers. Object includes the data itself along with variable amount of metadata. The Objects are organized into buckets.
- The Data and management API are often provided using REST paradigm over http protocol. This allows objects to be accessed using standard web based tools and techniques.
- The benefits include technical aspects like scaleble architecture, high availability and superior cost effectiveness.
- Given the distributed architecture, most Object Storage systems provide higher throughput and availability at the cost of strong consistency guarantees and low latencies. Therefore Object storage are suitable when large objects can be stored.
- Amazon S3, Swift , DellEMC ECS, Azure blob storage are some of the examples of popular Object Storage.
Dell EMC Object Storage is a enterprise grade converged infrastructure appliance that can be deployed in many configurations on premises. ECS provides enhanced access control suitable for enterprise environments. The data can be accessed using multiple protocols including Extended S3, Swift, HDFS and NFS. Although Extended S3 is a preferred and recommended protocol. Amazon S3 being a cloud storage, with it the data resides in the public cloud. Please find more details about ECS Overview and Architecture here.
Most Object Storages treat objcets as immutable - which means that object can not be modified partially and can not be appended to once created. In this case it has to be overwritten completely which is highly inefficient. ECS offers ability to modify a range bytes of existing objects through Extended S3 API.
-
Pravega segments are append only meaning each write is appended to the existing segment. This is accomplished by using PUT requests with Range headers. Please find more details about API here http://relweb-asd.lss.emc.com/ecs/release/docs/3.4/API/S3ObjectOperations_createOrUpdateObject_7916bd6f789d0ae0ff39961c0e660d00_ba672412ac371bb6cf4e69291344510e_detail.html
-
Pravega uses opensource ECS client to access data. More details can be found here. Java SDK for storing objects in ECS
In addition to usual prerequisites for Pravega, following requisites must be satisfied.
- ECS should be properly provisioned and set up before it can be used by Pravega.
- It is recommended that ECS be fronted by a load balancer (either hardware or software) and the end point must be accessible to Kubernetes. (See section on load balancer below for working without pre-configured load balancer.)
- Access key and secret should be created.
- A bucket should be created.
- Pravega should have full read write access to this bucket.
- Retention policy?.
- Each deployment must use unique prefix.
First create access key and secret keys in ECS to use with Pravega.
Next create a file with the secret definition containing your access and secret keys.
apiVersion: v1
kind: Secret
metadata:
name: ecs-credentials
type: Opaque
stringData:
ACCESS_KEY_ID: [email protected]
SECRET_KEY: 0123456789
Assuming that the file is named ecs-credentials.yaml.
$ kubectl create -f ecs-credentials.yaml
Follow the instructions to [deploy Pravega] (https://github.com/pravega/pravega-operator/blob/master/doc/manual-installation.md#install-the-pravega-cluster-manually) and configure the Tier 2 block in your PravegaCluster
manifest with your ECS connection details and a reference to the secret above.
Please refer to Config URI format for additional properties that can be configured using Config URI.
spec:
tier2:
ecs:
configUri: http://10.247.10.52:9020?namespace=pravega
bucket: "shared"
prefix: "example"
credentials: ecs-credentials
Additional configuration values can be specified in config file for segment store.
Config | Optional? | Value | Description |
---|---|---|---|
smallObjectSizeLimitForConcat | Yes | Positive Integer. Default: 1 MB . recommended: 1048576 (1MB). | Size of ECS objects in bytes above which it is no longer considered a small object.This value is used to optimize transactions performance when size of transaction segments is small.For small transaction segments, to implement concat ExtendedS3Storage reads complete source segment and appends it to target instead of using multipart upload. |
During the course of operation Pravega publishes following metrics.
Metric Name | Type | Description |
---|---|---|
segmentstore.storage.read_latency_ms | Histogram | Read latency |
segmentstore.storage.write_latency_ms | Histogram | Write latency |
segmentstore.storage.create_latency_ms | Histogram | Create latency |
segmentstore.storage.delete_latency_ms | Histogram | Delete latency |
segmentstore.storage.concat_latency_ms | Histogram | Concat latency |
segmentstore.storage.read_bytes | Counter | Number of bytes read |
segmentstore.storage.write_bytes | Counter | Number of bytes written |
segmentstore.storage.concat_bytes | Counter | Number of bytes concatenated |
segmentstore.storage.create_count | Counter | Number of create operations |
segmentstore.storage.delete_count | Counter | Number of delete operations |
segmentstore.storage.concat_count | Counter | Number of concat operations |
segmentstore.storage.large_concat_count | Counter | Number of concat operations performed without using optimization |
Example http://10.1.83.51:9020?identity=<access-key>&secretKey=<secret-key>
Config URI options
- smartClient - true or false, defaults to true.
Config URI options
- useVHost - true or false (default false).
- vdcs - each instance is a single vdc, represented as a comma-separated list of hosts, with the first used as the name. Multiple instances are allowed. NOTE: You can also specify a single host in the URI, before the query, and that will be used as a single one-node VDC if no vdcs are specified in the query.
Example https://10.1.100.11:9021?vdcs=10.1.100.11,10.1.100.12&vdcs=10.2.100.11,10.2.100.12&vdcs=10.3.100.11,10.3.100.12&geoPinningEnabled=true&geoReadRetryFailover=true&identity=<access-key>&secretKey=<secret-key>
If ECS is configured to use external load balancer then there is no need to use VDC or smart client.
Pravega currently does not support multi site deployments.
Pravega - Streaming as a new software defined storage primitive
- Contributing
- Guidelines for committers
- Testing
-
Pravega Design Documents (PDPs)
- PDP-19: Retention
- PDP-20: Txn Timeouts
- PDP-21: Protocol Revisioning
- PDP-22: Bookkeeper Based Tier-2
- PDP-23: Pravega Security
- PDP-24: Rolling Transactions
- PDP-25: Read-Only Segment Store
- PDP-26: Ingestion Watermarks
- PDP-27: Admin Tools
- PDP-28: Cross Routing Key Ordering
- PDP-29: Tables
- PDP-30: Byte Stream API
- PDP-31: End-to-End Request Tags
- PDP-32: Controller Metadata Scalability
- PDP-33: Watermarking
- PDP-34: Simplified-Tier-2
- PDP-35: Move Controller Metadata to KVS
- PDP-36: Connection Pooling
- PDP-37: Server-Side Compression
- PDP-38: Schema Registry
- PDP-39: Key-Value Tables Beta 1
- PDP-40: Consistent Order Guarantees for Storage Flushes
- PDP-41: Enabling Transport Layer Security (TLS) for External Clients
- PDP-42: New Resource String Format for Authorization
- PDP-43: Large Events
- PDP-44: Lightweight Transactions
- PDP-45: Health Check
- PDP-46: Read Only Permissions For Reading Data
- PDP-47: Pravega Message Queues
- PDP-48: Key-Value Tables Beta 2
- PDP-49: Segment Store Admin Gateway
- PDP-50: Stream Tags
- PDP-51: Segment Container Event Processor
- PDP-53: Robust Garbage Collection for SLTS