Skip to content

Latest commit

 

History

History
10 lines (7 loc) · 446 Bytes

README.md

File metadata and controls

10 lines (7 loc) · 446 Bytes

Spark-for-Big-Data

Udacity Course

This repository demonstrates how to use Spark to work with big data and build machine learning models at scale.

Goals

  • Practice processing and cleaning datasets to get comfortable with Spark’s SQL and dataframe APIs (Spark SQL, PySpark).
  • Debug and optimize for data skewness when running on a cluster.
  • Use Spark’s Machine Learning Library (MLlib) to train machine learning models at scale.