Skip to content

Using ECS as Tier 2

Sachin Jayant Joshi edited this page Mar 30, 2020 · 9 revisions

DellEMC ECS Support

Since 0.7 version, Pravega now fully supports DellEMC ECS object storage (ECS for short) as Tier 2 using the open-source Extended S3 Client. This is in addition to other Tier 2 options Pravega supports today (HDFS and Filesystem (NFS)).

Background

What is Object Storage?

Unlike file systems which manages data in a hierarchy of files and directories or unlike block oriented block storage, Object storage organizes data as independent objects with globally unique identifiers. Object includes the data itself along with variable amount of metadata. The Objects are organized into buckets.

  • The Data and management API are often provided using REST paradigm over http protocol. This allows objects to be accessed using standard web based tools and techniques.
  • The benefits include technical aspects like scaleble architecture, high availability and superior cost effectiveness.
  • Given the distributed architecture, most Object Storage systems provide higher throughput and availability at the cost of strong consistency guarantees and low latencies. Therefore Object storage are suitable when large objects can be stored.
  • Amazon S3, Swift , DellEMC ECS, Azure blob storage are some of the examples of popular Object Storage.

Difference between ECS and S3

Dell EMC Object Storage is a enterprise grade converged infrastructure appliance that can be deployed in many configurations on premises. ECS provides enhanced access control suitable for enterprise environments. The data can be accessed using multiple protocols including Extended S3, Swift, HDFS and NFS. Although Extended S3 is a preferred and recommended protocol. Amazon S3 being a cloud storage, with it the data resides in the public cloud. Please find more details about ECS Overview and Architecture here.

What is Extended S3 API ? How does it differ from S3 API?

Most Object Storages treat objcets as immutable - which means that object can not be modified partially and can not be appended to once created. In this case it has to be overwritten completely which is highly inefficient. ECS offers ability to modify a range bytes of existing objects through Extended S3 API.

How does Pravega access ECS using Extended S3 API?

How to configure

Prerequisites

In addition to usual prerequisites for Pravega, following requisites must be satisfied.

  • ECS should be properly provisioned and set up before it can be used by Pravega.
  • It is recommended that ECS be fronted by a load balancer (either hardware or software) and the end point must be accessible to Kubernetes. (See section on load balancer below for working without pre-configured load balancer.)
  • Access key and secret should be created.
  • A bucket should be created.
    • Pravega should have full read write access to this bucket.
    • Retention policy?.
  • Each deployment must use unique prefix.

Configuration steps.

Step 1. Configure bucket.

Step 2. Create secret.

First create access key and secret keys in ECS to use with Pravega.

Next create a file with the secret definition containing your access and secret keys.

apiVersion: v1
kind: Secret
metadata:
  name: ecs-credentials
type: Opaque
stringData:
  ACCESS_KEY_ID: [email protected]
  SECRET_KEY: 0123456789

Assuming that the file is named ecs-credentials.yaml.

$ kubectl create -f ecs-credentials.yaml

Step 3. Deploy Pravega with ecs for tier-2

Follow the instructions to [deploy Pravega] (https://github.com/pravega/pravega-operator/blob/master/doc/manual-installation.md#install-the-pravega-cluster-manually) and configure the Tier 2 block in your PravegaCluster manifest with your ECS connection details and a reference to the secret above.

Please refer to Config URI format for additional properties that can be configured using Config URI.

spec:
  tier2:
    ecs:
      configUri: http://10.247.10.52:9020?namespace=pravega
      bucket: "shared"
      prefix: "example"
      credentials: ecs-credentials

Additional configuration values

Additional configuration values can be specified in config file for segment store.

Config Optional? Value Description
smallObjectSizeLimitForConcat Yes Positive Integer. Default: 1 MB . recommended: 1048576 (1MB). Size of ECS objects in bytes above which it is no longer considered a small object.This value is used to optimize transactions performance when size of transaction segments is small.For small transaction segments, to implement concat ExtendedS3Storage reads complete source segment and appends it to target instead of using multipart upload.

Metrics published

During the course of operation Pravega publishes following metrics.

Metric Name Type Description
segmentstore.storage.read_latency_ms Histogram Read latency
segmentstore.storage.write_latency_ms Histogram Write latency
segmentstore.storage.create_latency_ms Histogram Create latency
segmentstore.storage.delete_latency_ms Histogram Delete latency
segmentstore.storage.concat_latency_ms Histogram Concat latency
segmentstore.storage.read_bytes Counter Number of bytes read
segmentstore.storage.write_bytes Counter Number of bytes written
segmentstore.storage.concat_bytes Counter Number of bytes concatenated
segmentstore.storage.create_count Counter Number of create operations
segmentstore.storage.delete_count Counter Number of delete operations
segmentstore.storage.concat_count Counter Number of concat operations
segmentstore.storage.large_concat_count Counter Number of concat operations performed without using optimization

Advanced deployment options

Load balancing with S3 client

Using Smart Client

Example http://10.1.83.51:9020?identity=<access-key>&secretKey=<secret-key> Config URI options

  • smartClient - true or false, defaults to true.

Using VDCs

Config URI options

  • useVHost - true or false (default false).
  • vdcs - each instance is a single vdc, represented as a comma-separated list of hosts, with the first used as the name. Multiple instances are allowed. NOTE: You can also specify a single host in the URI, before the query, and that will be used as a single one-node VDC if no vdcs are specified in the query.

Example https://10.1.100.11:9021?vdcs=10.1.100.11,10.1.100.12&vdcs=10.2.100.11,10.2.100.12&vdcs=10.3.100.11,10.3.100.12&geoPinningEnabled=true&geoReadRetryFailover=true&identity=<access-key>&secretKey=<secret-key>

Using External Load Balancer

If ECS is configured to use external load balancer then there is no need to use VDC or smart client.

Multi site support.

Pravega currently does not support multi site deployments.

References

Clone this wiki locally