PySpark was set-up for this course using any one of the below mentioned methods -
- Ubuntu + Spark + Python on Virtual Box
- Amazon EC2 with Python and Spark
- Databricks Notebook System
- AWS EMR Notebook (Not Free)
Implemented Machine Learning Techniques using PySpark -
- Linear Regression
- Logistic Regression
- Tree Methods i. Decision Trees ii. Random Forests iii. Gradient Boosted Trees
- K-means Clustering
- Recommender Systems
- Natural Language Processing
- Spark Streaming via Twitter