Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filesystem - Large File Handling (Caching) #6411

Open
2 of 7 tasks
Tracked by #1327
mrnicegyu11 opened this issue Sep 20, 2024 · 4 comments
Open
2 of 7 tasks
Tracked by #1327

Filesystem - Large File Handling (Caching) #6411

mrnicegyu11 opened this issue Sep 20, 2024 · 4 comments
Assignees
Labels
PO issue Created by Product owners [PLEASE use osparc-issue repo]
Milestone

Comments

@mrnicegyu11
Copy link
Member

mrnicegyu11 commented Sep 20, 2024

Event Horizon

Preview Give feedback
  1. 1 of 2
    a:dynamic-sidecar
    GitHK matusdrobuliak66
    sanderegg
  2. a:autoscaling
    sanderegg

Tasks

Preview Give feedback
No tasks being tracked yet.

MartinKippenberger

Preview Give feedback
  1. a:infra+ops efs-guardian
    matusdrobuliak66
  2. matusdrobuliak66
@mguidon
Copy link
Member

mguidon commented Nov 5, 2024

  • Measure during the next sprint using aws dashboards/graphs
  • Alternative (lustre)

@matusdrobuliak66
Copy link
Contributor

matusdrobuliak66 commented Nov 24, 2024

General notes

Type Overview Examples
Client-Server Central server stores files, accessed by clients over the network. NFS, SMB
Peer-to-Peer Nodes share files directly without a centralized server. IPFS, BitTorrent
Object-Based Files broken into objects with separate metadata for scalability. Amazon S3, Ceph, MinIO
Clustered Multiple nodes form a cluster to share resources and distribute workloads. GFS, HDFS, Lustre
Parallel Multiple servers handle data access simultaneously to improve performance. IBM GPFS, BeeGFS
Cloud-Based File systems designed to leverage cloud infrastructure. Azure Blob Storage, Google Cloud Storage
Block-Based Files divided into fixed-size blocks and distributed across nodes. GlusterFS, MooseFS
Metadata-Based Separates metadata from data storage for faster access and management. XtreemFS
Hybrid Combines features of multiple DFS types for flexibility. Red Hat Ceph Storage
Specialized Designed for specific workloads or industries. QFS, OrangeFS
  • XFS is not a distributed file system but an excellent choice as the underlying file system for distributed solutions. If you are setting up a distributed file system like GlusterFS, XFS is one of the best local file systems to use
  • GlusterFS can provide NFS-like shares and scale by adding more nodes.

What we want to achieve?

  • Speed up loading times for large projects by caching data from S3.
  • Mount a data folder across multiple EC2 instances.
  • Enable user billing for this feature.

Options looked at

  • Amazon EFS
  • Amazon FSx for Lustre
  • Regatta storage (pointed out by Dustin)
  • Gluster (open source)
  • Ceph
  • EBS (keeping EBS up)

EFS

  • 2 options:
    • Throughput mode - Elastic (the one we use now)
    • Throughput mode - Provisioned

Elastic

  • 💸 Storage 0.30$ (GB-Month)
  • 💸 Reads 0.03$ (per GB transferred)
  • 💸 Writes 0.06$ (per GB transferred)
  • Currently Enabled automatic backups -> Probably not needed
  • Currently we use around 1% of available IOPS
    Image

Provisioned

  • 20 TB per Month

    • 🏃‍♀ Default Throughput: 1024 MB/s
    • 💸 Storage: 20,480.00 GB (Standard Storage) × 0.30 USD/GB = 6,144.00 USD
  • 10 TB per Month

    • 🏃‍♀ Default Throughput: 512 MB/s
    • 💸 Storage: Standard Storage = 3,072.00 USD
    • Total Monthly Cost: 3,072.00 USD
  • 5 TB per Month

    • 🏃‍♀ Default Throughput: 256 MB/s
    • 💸 Storage: Standard Storage = 1,536.00 USD + Provisioned Throughput Cost (if applicable)
    • Example of Provisioned Throughput Cost: Additional 256 MB/s throughput = 256.00 MB/s-Month × 6.00 USD = 💸 1,536.00 USD

Notes

  • Currently we do not use Lifecycle management feature -> which would move data to infrequent access 0.025$ (GB-Month) - Files in the Standard Access class can be accessed with latency measured in single-digit milliseconds; files in the Infrequent Access class have latency in the double-digits - probably not useful for our usecase.

PROS:

  • it was easy to setup
  • POSIX-compliant
  • Good for metadata-heavy operations (NFS type)

CONS:

  • does not support user quotas (but we already have a custom solution implemented)
  • To enable user billing based on throughput ->custom solution needs to be implemented (see section in the end of this review)
  • cost? 💸
  • What happens if we hit the limits of provisioned speed/storage?

Amazon FSx for Lustre

  • https://aws.amazon.com/fsx/lustre/pricing/
  • Cost are (Storage + Throughput together) with default IOPS + additional Data transferred "in" to and "out" from Amazon FSx across AZs or VPC Peering connections in the same Region is charged at $0.01/GB in each direction.
    Image
  • 20 TB per Month (21600 GB because only multiplication of 2400)
    • 🏃‍♀ Throughput: 500 MB/s
    • 💸 Storage: 21600 GB (Standard Storage) × 0.340 USD/GB = 7344 USD
  • 20 TB per Month (Scratch SSD) (21600 GB because only multiplication of 2400)
    • 🏃‍♀ Throughput: 200 MB/s
    • 💸 Storage: 21600 GB (Standard Storage) × 0.140 USD/GB = 3024 USD
    • 🔴 I am not sure whether scratch file system is a good idea:
      Image
  • Additional costs 💸: Provisioned metadata IOPS 0.055$ per IOPS-month
    • ex. buying 1500 additional more: 1500 * 0.055 = 82.5$
      Image

PROS:

  • AWS managed
  • does support user quotas

CONS:

  • To enable user billing based on throughput ->custom solution needs to be implemented (see section in the end of this review)
  • cost? 💸
  • What happens if we hit the limits of provisioned speed/storage?

Regatta storage

  • https://regattastorage.com/ (pointed out by Dustin)
  • Fix costs
    • 25$ (per Month)
  • Storage
    • 0.20$ (GB-Month)1,000
  • Throughput
    • 0.05$ (per GB transferred)

CONS:

  • not a lot of reviews
  • 3rd party vendor
  • Not a lot of customization (monitoring? user quotas?)
  • caching logic is handled by them (they remove it after 1 hour) - not useful for our usecase

Gluster (open source)

  • https://www.gluster.org/install/

  • AWS Installation: https://docs.gluster.org/en/v3/Install-Guide/Setup_aws/

  • gp3 EBS costs: 💸 0.08/GB-month + 875 * 0.04$/MB/s-month over 125MB/s + 13000 * 0.005$/IOPS-month over 3000 as we currently maximally boost EBS speed with additional 875 MB/s speed and 13000 IOPS. (Q: we should analyse how much is really needed)

  • EC2:

    • 4x m5.large (2 vCPUs, 8GB RAM, 0.096 hourly * 24 * 30 = 70$ monthly)
      • 💸 70 * 4 = 280$ monthly
    • 8x m5.large (2 vCPUs, 8GB RAM)
      • 💸 70 * 8 = 560$ monthly
  • EBS:

    • 1 TB disk space:
      • 🏃‍♀ Throughput: 1000 MB/s & 16000 IOPS
      • 💸Cost Breakdown:
        • 0.08 * 1000 = 80 USD
        • 875 * 0.04 = 35 USD
        • 13,000 * 0.005 = 65 USD
      • Total Monthly Cost: 180 USD
    • 💸 20 TB disk space (we need at least 30TB for fault tolerance)
      • 💸 180 * 30 = 5400 $

TOTAL ESTIMATE ~ 6000$

CONS:

  • 🔴 maintenance!
  • setting up
  • What happens if we hit the limits of provisioned speed/storage?

Ceph (using inhouse Ceph cluster)

  • Q: Probably issue that we need to move data between Cloud and inhouse?
  • hard to maintain

EBS (keeping EBS up)

  • gp3 EBS costs: 💸 0.08/GB-month + 875 * 0.04$/MB/s-month over 125MB/s + 13000 * 0.005$/IOPS-month over 3000 as we currently maximally boost EBS speed with additional 875 MB/s speed and 13000 IOPS. (Q: we should analyse how much is really needed)
  • 1 TB disk space example:
    • 🏃‍♀ Throughput: 1000 MB/s & 16000 IOPS
    • 💸Cost Breakdown:
      • 0.08 * 1000 = 80 USD
      • 875 * 0.04 = 35 USD
      • 13,000 * 0.005 = 65 USD
    • Total Monthly Cost: 180 USD
      • Keeping up for 10 Days: 60 USD

PROS:

  • 💸 predictable pricing model eliminates the need for upfront provisioning. Also not need to deal with hitting limits of provisioned distributed file system.
  • Probably not need to monitor throughput (pricing model can be done on run time of EBS volume)

CONS:

  • Useful for caching user workspace, but not if we want to have one general solution also for mouting folder to multiple EC2 instances.

Additional notes:

  • To enable user billing based on throughput
    • Probably we need to create a custom CloudWatch metric and push data (e.g., Lustre I/O statistics) from EC2 instance to CloudWatch. Lustre has lctl or in case of EFS probably nfsstat or df and iotop (Needs to be investigated!)

@matusdrobuliak66
Copy link
Contributor

matusdrobuliak66 commented Nov 25, 2024

Conclusion (25.11.2024 Matus/Manuel discussed)

  • Moving away from the idea of using a distributed file system for caching and instead relying solely on pure EBS volumes at the user level seems to be a more reasonable approach.
    • From the analysis above, you can see that it is much cheaper than the other solutions.
    • It involves fewer issues to address:
      • For instance, how to handle distributed performance/limits, such as what happens if we reach the storage limit?
      • There is no need to measure throughput.
    • We can build a better and more manageable pricing model around it (e.g., credits per hour of running the cached EBS).
    • As a long-term strategy, we aim to provide users with the option to create their own computers with multiple services (Enhancement: Lock a EC2 machine for a specific project #5669). With this approach, mounting an EBS volume will automatically allow it to be shared across all services.

NEXT Steps:

NOTE: We might use current EFS infrastructure to store VIP models

@matusdrobuliak66
Copy link
Contributor

matusdrobuliak66 commented Nov 26, 2024

Update 26.11.2024

  • We identified an issue that will also affect the logic with EBS keeping it up. The dynamic sidecar is always changing permissions for all files at the start. Until now, this was typically done for an empty volume. However, if there is data with many files, and it doesn’t manage to change permissions within 1 minute, the Docker health check will kill it and prevent the sidecar from starting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PO issue Created by Product owners [PLEASE use osparc-issue repo]
Projects
None yet
Development

No branches or pull requests

6 participants