NVIDIA GPU Direct Storage with Dell PowerScale

Overview

This project demonstrates the integration of NVIDIA GPU Direct Storage (GDS) with Dell PowerScale using NFS over RDMA. It showcases how to set up and optimize a high-performance computing environment for data-intensive workloads in AI and analytics.

The repository includes:

Configuration guide for Dell PowerScale and NVIDIA GDS
Benchmarking scripts using NVIDIA's gdsio utility
A data loading pipeline using NVIDIA DALI for efficient GPU-accelerated data preprocessing

Prerequisites

Dell PowerScale
Server equiped with NVIDIA GPUs and Mellanox ConnectX-6 NICs
NVIDIA GPU drivers
CUDA Toolkit
NVIDIA GDS software stack

Installation

Clone this repository:

git clone https://github.com/DellGEOS/gds-powerscale-project.git
cd gds-powerscale-project

Install required Python packages:
```
pip install nvidia-dali
```
Follow the NVIDIA GDS Installation Guide to install NVIDIA drivers, nvidia-fs, and the CUDA toolkit.

Configuration

Configure PowerScale for GDS:

isi compression settings modify --enabled=0
isi dedupe inline settings modify --mode=disabled

Enable NFS over RDMA on PowerScale subnet and network pool.

Mount the PowerScale NFS share using RDMA:

mount -o rdma,vers=3 <PowerScale_IP>:/ifs/RDMA-Test /mnt/RDMA

Refer to the full blog post in this repository for detailed configuration steps and best practices.

Usage

The main components of this project are:

GDS configuration and setup
Benchmarking scripts
DALI data loader for efficient data preprocessing

Each component has its own usage instructions detailed in the respective sections below.

Benchmarking

Use the gdsio utility to benchmark your GDS setup:

# Write benchmark
sudo ./gdsio -f /mnt/RDMA/testfile -d 0 -m 0 -s 10G -i 1M -w 10 -x 0 -I 1

# Read benchmark
sudo ./gdsio -f /mnt/RDMA/testfile -d 0 -m 0 -s 10G -i 1M -w 10 -x 0 -I 0

Refer to the blog post for interpretation of results and comparison with CPU-only transfers.

DALI Data Loader

The dali_loader.py script demonstrates how to use NVIDIA DALI for efficient data loading and preprocessing. To run the script:

python dali_loader.py --data_dir /path/to/your/images --batch_size 32 --image_size 224 --shuffle

This script loads images from the specified directory, applies preprocessing, and prepares them for model training.

Best Practices

The repository includes an enhanced version of the DALI loader script that incorporates several best practices for data scientists working with large datasets. Key improvements include:

Configurability through command-line arguments
Robust error handling and logging
Option for dataset shuffling
Additional data augmentation techniques

Refer to the "Best Practices for Data Scientists" section in the blog post for more details.

Troubleshooting

For common issues and their solutions, please refer to the NVIDIA GDS Troubleshooting Guide.

If you encounter any problems specific to this project, please open an issue in the GitHub repository.

Contributing

Contributions to this project are welcome! Please fork the repository and submit a pull request with your improvements.

License

[Specify your license here, e.g., MIT, Apache 2.0, etc.]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
PowerScale Data Loader		PowerScale Data Loader
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA GPU Direct Storage with Dell PowerScale

Overview

Table of Contents

Prerequisites

Installation

Configuration

Usage

Benchmarking

DALI Data Loader

Best Practices

Troubleshooting

Contributing

License

About

Releases

Packages

DellGEOS/GDS

Folders and files

Latest commit

History

Repository files navigation

NVIDIA GPU Direct Storage with Dell PowerScale

Overview

Table of Contents

Prerequisites

Installation

Configuration

Usage

Benchmarking

DALI Data Loader

Best Practices

Troubleshooting

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages