-
Notifications
You must be signed in to change notification settings - Fork 1
/
conclusion.tex
25 lines (23 loc) · 1.48 KB
/
conclusion.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
We present FlashR, a matrix-oriented programming framework that executes
machine learning algorithms in parallel and out-of-core
automatically. FlashR scales to large datasets by utilizing commodity SSDs.
Although R is considered slow and unable to scale to large datasets,
we demonstrate that with sufficient system-level optimizations, FlashR
achieves high performance and scalability
for many machine learning algorithms. R implementations executed in FlashR
outperform H$_2$O and Spark MLlib on all algorithms by a factor of $3-20$.
FlashR scales to datasets with billions of
data points easily with negligible amounts of memory and completes all
algorithms within a reasonable amount of time. With FlashR, machine learning
researchers can prototype algorithms in a familiar programming environment,
while still getting efficient and scalable implementations.
We believe FlashR provides new opportunities for developing large-scale
machine learning algorithms.
Even though the current I/O technologies, such as solid-state drives (SSDs),
are an order of magnitude slower than DRAM, the external-memory execution
of many algorithms in FlashR achieves performance approaching their in-memory
execution. As the number of features and other factors, such as the number of
clusters in clustering algorithms, increase, we expect FlashR on SSDs to achieve
the same performance as
in memory. We demonstrate that an I/O throughput of 10 GB/s saturates the CPU
for many algorithms, even in a large parallel NUMA machine.