Skip to content

flydt/estuary-ng

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Estuary

Estuary is a lustre S3 copytool. This means it can interact with the HSM of an existing lustre and send the file contents to S3 via curl.

This copytool is an updated version of the lustre-s3-copytool by ComputeCanada.

ng ---- only a postfix to difference with none "-ng" version

Features

change from none "-ng" version

remove LZ4 support

bucket and file system path mapping 1:1

remove checksum when restore

Building

Estuary requires the following development packages:

sudo dnf install -y libcurl-devel libxml2-devel openssl-devel libconfig-devel libbsd-devel

and Lustre development headers, see the LustreSetupGuide for some info.

A recent version of CMake (>3.24) is also required (this is newer than in most distros), there is a script here to fetch it if needed.

Then do:

git clone https://git.ichec.ie/performance/storage/estuary.git
mkdir build; cd build
cmake ../estuary
make

the binary estuary_s3copytool will be in the bin directory of build.

Running

Command line arguments

To see the full list of command line arguments run ./estuary_s3copytool --help:

# ./estuary_s3copytool --help
 Usage: estuary_s3copytool [options]... <mode> <lustre_mount_point>
The Lustre HSM S3 copy tool can be used as a daemon or as a command line tool
The Lustre HSM daemon acts on action requests from Lustre
to copy files to and from an HSM archive system.
   --daemon            Daemon mode, run in background
 Options:
The Lustre HSM tool performs administrator-type actions
on a Lustre HSM archive.
   --abort-on-error          Abort operation on major error
   -A, --archive <#>         Archive number (repeatable)
   -c, --config <path>       Path to the config file
   --dry-run                 Don't run, just show what would be done
   -q, --quiet               Produce less verbose output
   -u, --update-interval <s> Interval between progress reports sent
                             to Coordinator
   -v, --verbose             Produce more verbose output

Config file

Copy the file config.cfg to your runtime directory and fill it out as required (the path of the config file can also be passed as a runtime parameter).

Parameter Type Description
access_key String AWS access key.
secret_key String AWS Secret key.
host String Hostname of the S3 endpoint.
bucket_count Int The number of buckets used to spread the indexing load. With radosgw, PUT operation will slow down proportionally to the number of objects in the same bucket. If a bucket_count > 2 is used, the bucket_prefix will be appended an ID.
bucket_prefix String This prefix will prepended to each bucketID. For example, if the bucket_prefix is hsm, then each bucket will named hsm_0, hsm_1, hsm_2 ...
ssl Bool If the S3 endpoint should use SSL.
chunk_size Int This represent the size of the largest object stored. A large file in Lustre will be stripped in multiple objects if the file size > chunk_size. Because compression is used, this parameter need to be set according to the available memory. Each thread will use twice the chunk_size. For incompressible data, each object will take a few extra bytes.

If you want a local S3 test server there are notes in the Developer Guide for using Minio.

Lustre HSM

Enable HSM on the MDS server

# lctl set_param mdt.lustre-MDT0000.hsm.max_requests=10
# lctl set_param mdt.lustre-MDT0000.hsm_control=enabled

Start the copytool on a DTN node

# ./estuary_s3copytool /lustre/
1456506103.926649 copytool_d[31507]: mount_point=/lustre/
1456506103.932785 copytool_d[31507]: waiting for message from kernel

You can use lfs hsm_state to get the current status of a file

Move a file to HSM and remove it from Lustre

# lfs hsm_state test
test: (0x00000000)
# lfs hsm_archive test
# lfs hsm_state
test: (0x00000009) exists archived, archive_id:1
# lfs hsm_release test
# lfs hsm_state test
test: (0x0000000d) released exists archived, archive_id:1

Restore the file implicitly

# md5sum test
33e3e3bdb7f6f847e06ae2a8abad0b85  test
# lfs hsm_state test
test: (0x00000009) exists archived, archive_id:1

Remove the file from S3

# lfs hsm_remove test
# lfs hsm_state test
test: (0x00000000), archive_id:1

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published