This project demonstrates the integration of NVIDIA GPU Direct Storage (GDS) with Dell PowerScale using NFS over RDMA. It showcases how to set up and optimize a high-performance computing environment for data-intensive workloads in AI and analytics.
The repository includes:
- Configuration guide for Dell PowerScale and NVIDIA GDS
- Benchmarking scripts using NVIDIA's
gdsio
utility - A data loading pipeline using NVIDIA DALI for efficient GPU-accelerated data preprocessing
- Prerequisites
- Installation
- Configuration
- Usage
- Benchmarking
- DALI Data Loader
- Best Practices
- Troubleshooting
- Contributing
- License
- Dell PowerScale
- Server equiped with NVIDIA GPUs and Mellanox ConnectX-6 NICs
- NVIDIA GPU drivers
- CUDA Toolkit
- NVIDIA GDS software stack
-
Clone this repository:
git clone https://github.com/DellGEOS/gds-powerscale-project.git cd gds-powerscale-project
-
Install required Python packages:
pip install nvidia-dali
-
Follow the NVIDIA GDS Installation Guide to install NVIDIA drivers, nvidia-fs, and the CUDA toolkit.
-
Configure PowerScale for GDS:
isi compression settings modify --enabled=0 isi dedupe inline settings modify --mode=disabled
-
Enable NFS over RDMA on PowerScale subnet and network pool.
-
Mount the PowerScale NFS share using RDMA:
mount -o rdma,vers=3 <PowerScale_IP>:/ifs/RDMA-Test /mnt/RDMA
Refer to the full blog post in this repository for detailed configuration steps and best practices.
The main components of this project are:
- GDS configuration and setup
- Benchmarking scripts
- DALI data loader for efficient data preprocessing
Each component has its own usage instructions detailed in the respective sections below.
Use the gdsio
utility to benchmark your GDS setup:
# Write benchmark
sudo ./gdsio -f /mnt/RDMA/testfile -d 0 -m 0 -s 10G -i 1M -w 10 -x 0 -I 1
# Read benchmark
sudo ./gdsio -f /mnt/RDMA/testfile -d 0 -m 0 -s 10G -i 1M -w 10 -x 0 -I 0
Refer to the blog post for interpretation of results and comparison with CPU-only transfers.
The dali_loader.py
script demonstrates how to use NVIDIA DALI for efficient data loading and preprocessing. To run the script:
python dali_loader.py --data_dir /path/to/your/images --batch_size 32 --image_size 224 --shuffle
This script loads images from the specified directory, applies preprocessing, and prepares them for model training.
The repository includes an enhanced version of the DALI loader script that incorporates several best practices for data scientists working with large datasets. Key improvements include:
- Configurability through command-line arguments
- Robust error handling and logging
- Option for dataset shuffling
- Additional data augmentation techniques
Refer to the "Best Practices for Data Scientists" section in the blog post for more details.
For common issues and their solutions, please refer to the NVIDIA GDS Troubleshooting Guide.
If you encounter any problems specific to this project, please open an issue in the GitHub repository.
Contributions to this project are welcome! Please fork the repository and submit a pull request with your improvements.
[Specify your license here, e.g., MIT, Apache 2.0, etc.]