GPU vs CPU v2

Description

This is an addition to the GPU_vs_CPU demonstration created on July 28, 2022.

The goal of that demonstration was to showcase the speed gains from:

switching from the pandas library to the cuDF library
switching from the scikit-learn library to the cuML library

While the gains are significant, it is not a 'fair' comparison for the following reasons:

the pandas library is locked to using a single CPU thread
the scikit-learn library uses one thread by default (and only the default configurations were tested).

Therefore, this experiment (GPU_vs_CPU_v2) was created to address those shortcomings.

In this experiment, we use the Dask framework, which allows us to utilize all CPU cores.
Dask also support the Rapids.ai backend, which allows us to use GPUs.

For our test, we compare performance of Dask-CPU vs Dask-GPU, in 3 main areas:

ETL
- CPU: we use the Dask DataFrame object using a Dask CPU (all cores/threads) cluster
- GPU: we use the Dask DataFrame object using a Dask GPU (single GPU) cluster
Dimensionality reduction
- CPU: we use scikit-learn combined with joblib + Dask CPU cluster backend
- GPU: we use cuML combined with Dask GPU cluster backend
ML model training
- CPU: we use scikit-learn combined with joblib + Dask CPU cluster backend
- GPU: we use cuML combined with Dask GPU cluster backend

Hardware configuration

This experiment was run on a custom-build Desktop machine.
We selected a GPU and CPU that were around the same price range.

Relevant hardware (used in testing)

(name links direct to newegg.com page; price links direct to screenshot of hardware price as of Nov 20, 2022)

CPU: AMD Ryzen Threadripper 3960X Processor 24-cores (48-threads) @ 3.8GHz) | $1,399.99
GPU #2: ZOTAC GAMING GeForce RTX 3090 Trinity OC (PCIe 4.0) | $1,249.99

Other hardware

Motherboard: ASUS ROG Zenith II Extreme Alpha TRX40
RAM: 2x G.SKILL Trident Z Neo Series (2 x 32GB) (128GB total) @ 2666 MT/s
SSD: 3x Kingston FURY Renegade PCIe 4.0 NVMe M.2 SSD 2TB (6TB total)
GPU #0: ASUS ROG Strix GeForce RTX 3090 (PCIe 4.0) (NOT USED IN EXPERIMENT)
GPU #1: ZOTAC GAMING GeForce RTX 3060 Ti Twin Edge OC (PCIe 4.0) (NOT USED IN EXPERIMENT)
Cooling (CPU): CORSAIR iCUE H150i ELITE CAPELLIX Liquid CPU Cooler
PSU: EVGA SuperNOVA 1600 P+, 80+ Platinum 1600W, Fully Modular

Software configuration

Linux Mint 20.3
Miniconda
Mamba
'gvc-2' conda environment. All libraries installed can be found in the gvc-2.yaml file.

Procedures tested

Note: The selection of dimensionality reduction and machine learning procedures was based on what procedures were available to ALL of the libraries used (Dask, scikit-learn, and cuML).

ETL

Read CSV (single file)
Write CSV (single file)
Read CSV (multiple file)
Write CSV (multiple file)
Describe dataframe
Set index on dataframe
Concat multiple dataframes
Groupby aggregation (mean)
Fit label encoder
Encode data
Scale data
split data

Dimensionality reduction

PCA
TruncatedSVD

Machine learning models

OLS linear regression
k-means
gradient boosting

Results

Summary

For ETL, we see more than 70% reduction in time
For Dimensionality Reduction, we see a more than 83% reduction in time
For Machine Learning, we see a more than 91% reduction in time
End-to-end, we see a more than 74% reduction in time

Individual Task Figures

ETL (12 total)

Dimensionality Reduction (2 total)

Machine Learning (3 total)

Instructions

Clone repo

git clone https://github.com/jonathancosme/GPU_vs_CPU_v2.git

Navigate into repo directory

cd GPU_vs_CPU_v2

Create conda environment

mamba env create -n gvc-2 -f gvc-2.yaml

Activate environment

conda activate gvc-2

Set environment variable for GPU to use in experiment.
In my case, I wanted to use the GPU in PCI slot #2, so I ran CUDA_VISIBLE_DEVICES=2.
Set CUDA_VISIBLE_DEVICES={PCI slot # you want to use}.
If you only have ONE GPU on your machine, you can skip this step.

export CUDA_DEVICE_ORDER=PCI_BUS_ID && export CUDA_VISIBLE_DEVICES=2

start Jupyter lab

jupyter lab

Download data.
The dataset used was a fake dataset generated by me, conatining 10 columns and 20 millions rows.
It is a binary classification problem for detecting counterfeit drugs, and available for public download via Google Drive: sample_data_20m.csv
Move downloaded csv file into the folder GPU_vs_CPU_v2/sample_data
In Jupyter lab UI, open CPU_demo_v2.ipynb or GPU_demo_v2.ipynb
From the Jupyter lab menu, select Run > Run All Cells

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
images		images
output_data		output_data
plots		plots
sample_data		sample_data
.gitignore		.gitignore
CPU_demo_v2.ipynb		CPU_demo_v2.ipynb
GPU_demo_v2.ipynb		GPU_demo_v2.ipynb
README.md		README.md
gvc-2.yaml		gvc-2.yaml
plot_results.ipynb		plot_results.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU vs CPU v2

Description

Hardware configuration

Relevant hardware (used in testing)

Other hardware

Software configuration

Procedures tested

ETL

Dimensionality reduction

Machine learning models

Results

Summary

Individual Task Figures

ETL (12 total)

Dimensionality Reduction (2 total)

Machine Learning (3 total)

Instructions

About

Releases

Packages

Languages

jonathancosme/GPU_vs_CPU_v2

Folders and files

Latest commit

History

Repository files navigation

GPU vs CPU v2

Description

Hardware configuration

Relevant hardware (used in testing)

Other hardware

Software configuration

Procedures tested

ETL

Dimensionality reduction

Machine learning models

Results

Summary

Individual Task Figures

ETL (12 total)

Dimensionality Reduction (2 total)

Machine Learning (3 total)

Instructions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages