XGBoost stands for eXtreme Gradient Boosting. It is a highly optimized and distributed implementation of gradient boosting decision trees, to handle large and complicated datasets. In this project, we would we using XGBoost4J-Spark to build simple Classification and Regression models.
Requirements:
On Ubuntu, you can use scripts/setup.sh
to setup the pre-requisites.
sbt clean assembly
- Classification
spark-submit --class edu.missouri.XGBoost.ClassifierPipeline target/scala-2.12/XGBoost-Spark-assembly-0.1.jar
- Regression
spark-submit --class edu.missouri.XGBoost.RegressionPipeline target/scala-2.12/XGBoost-Spark-assembly-0.1.jar