Skip to content

Latest commit

 

History

History
17 lines (9 loc) · 620 Bytes

README.md

File metadata and controls

17 lines (9 loc) · 620 Bytes

This project focuses on comparing the performance of MapReduce and spark on a Hadoop Cluster for the same sufficiently large dataset. (which can be found here: https://archive.ics.uci.edu/ml/datasets/Poker+Hand )

MapReduceCode:

- Contains the MapReduce code written in Java.

SparkAppCode:

- Contains code written in Scala that can be run on a Cluster. 
  Add relevant `hdfs` or `s3` paths for the testing and training data.

- The app writes the classes of the Test Data to a local `.txt` file on the Master Node.

AccuracyTest

- Use `accuracyTest.java` to check the accuracy of the predicted classes.