This is an addition to the GPU_vs_CPU demonstration created on July 28, 2022.
The goal of that demonstration was to showcase the speed gains from:
- switching from the pandas library to the cuDF library
- switching from the scikit-learn library to the cuML library
While the gains are significant, it is not a 'fair' comparison for the following reasons:
- the pandas library is locked to using a single CPU thread
- the scikit-learn library uses one thread by default (and only the default configurations were tested).
Therefore, this experiment (GPU_vs_CPU_v2) was created to address those shortcomings.
In this experiment, we use the Dask framework, which allows us to utilize all CPU cores.
Dask also support the Rapids.ai backend, which allows us to use GPUs.
For our test, we compare performance of Dask-CPU vs Dask-GPU, in 3 main areas:
- ETL
- CPU: we use the Dask DataFrame object using a Dask CPU (all cores/threads) cluster
- GPU: we use the Dask DataFrame object using a Dask GPU (single GPU) cluster
- Dimensionality reduction
- CPU: we use scikit-learn combined with joblib + Dask CPU cluster backend
- GPU: we use cuML combined with Dask GPU cluster backend
- ML model training
- CPU: we use scikit-learn combined with joblib + Dask CPU cluster backend
- GPU: we use cuML combined with Dask GPU cluster backend
This experiment was run on a custom-build Desktop machine.
We selected a GPU and CPU that were around the same price range.
(name links direct to newegg.com page; price links direct to screenshot of hardware price as of Nov 20, 2022)
- CPU: AMD Ryzen Threadripper 3960X Processor 24-cores (48-threads) @ 3.8GHz) | $1,399.99
- GPU #2: ZOTAC GAMING GeForce RTX 3090 Trinity OC (PCIe 4.0) | $1,249.99
- Motherboard: ASUS ROG Zenith II Extreme Alpha TRX40
- RAM: 2x G.SKILL Trident Z Neo Series (2 x 32GB) (128GB total) @ 2666 MT/s
- SSD: 3x Kingston FURY Renegade PCIe 4.0 NVMe M.2 SSD 2TB (6TB total)
- GPU #0: ASUS ROG Strix GeForce RTX 3090 (PCIe 4.0) (NOT USED IN EXPERIMENT)
- GPU #1: ZOTAC GAMING GeForce RTX 3060 Ti Twin Edge OC (PCIe 4.0) (NOT USED IN EXPERIMENT)
- Cooling (CPU): CORSAIR iCUE H150i ELITE CAPELLIX Liquid CPU Cooler
- PSU: EVGA SuperNOVA 1600 P+, 80+ Platinum 1600W, Fully Modular
- Linux Mint 20.3
- Miniconda
- Mamba
- 'gvc-2' conda environment. All libraries installed can be found in the gvc-2.yaml file.
Note: The selection of dimensionality reduction and machine learning procedures was based on what procedures were available to ALL of the libraries used (Dask, scikit-learn, and cuML).
- Read CSV (single file)
- Write CSV (single file)
- Read CSV (multiple file)
- Write CSV (multiple file)
- Describe dataframe
- Set index on dataframe
- Concat multiple dataframes
- Groupby aggregation (mean)
- Fit label encoder
- Encode data
- Scale data
- split data
- PCA
- TruncatedSVD
- OLS linear regression
- k-means
- gradient boosting
- For ETL, we see more than 70% reduction in time
- For Dimensionality Reduction, we see a more than 83% reduction in time
- For Machine Learning, we see a more than 91% reduction in time
- End-to-end, we see a more than 74% reduction in time
- Clone repo
git clone https://github.com/jonathancosme/GPU_vs_CPU_v2.git
- Navigate into repo directory
cd GPU_vs_CPU_v2
- Create conda environment
mamba env create -n gvc-2 -f gvc-2.yaml
- Activate environment
conda activate gvc-2
- Set environment variable for GPU to use in experiment.
In my case, I wanted to use the GPU in PCI slot #2, so I ran CUDA_VISIBLE_DEVICES=2.
Set CUDA_VISIBLE_DEVICES={PCI slot # you want to use}.
If you only have ONE GPU on your machine, you can skip this step.
export CUDA_DEVICE_ORDER=PCI_BUS_ID && export CUDA_VISIBLE_DEVICES=2
- start Jupyter lab
jupyter lab
-
Download data.
The dataset used was a fake dataset generated by me, conatining 10 columns and 20 millions rows.
It is a binary classification problem for detecting counterfeit drugs, and available for public download via Google Drive: sample_data_20m.csv -
Move downloaded csv file into the folder GPU_vs_CPU_v2/sample_data
-
In Jupyter lab UI, open CPU_demo_v2.ipynb or GPU_demo_v2.ipynb
-
From the Jupyter lab menu, select Run > Run All Cells