spark-sklearn-airbnb-predict

Code example to predict prices of Airbnb vacation rentals, using scikit-learn on Spark.

The Jupyter notebook in this repo contains examples to run regression estimators on the Inside Airbnb listings dataset from San Francisco. The target variable is the price of the listing. To speed up the hyperparameter search, the notebook shows examples that use the spark-sklearn package to distribute GridSearchCV across nodes in a Spark cluster. This provides a much faster way to search and can lead to better results.

To run the scikit-learn examples (without Spark) the following packages are required:

Python 2
Pandas
NumPy
scikit-learn (0.17 or later)

These can be installed on the MapR Sandbox.

To run the scikit-learn examples with Spark, the following packages are required on each machine:

All of the above packages
Spark (1.5 or later)
spark-sklearn -- follow the installation instructions there

You can run this on a MapR cluster by following one of these methods:

Use the MapR Sandbox, which comes with Spark pre-installed. You must install Pandas, NumPy and scikit-learn.
If you have multiple machines available, use the MapR Community Editionand install the mapr-spark package on each machine, following the Spark on YARN documentation

Run the script with:

MASTER=yarn-client /opt/mapr/spark/spark-1.5.2/bin/spark-submit --num-executors=4 --executor-cores=8 python_scikit_airbnb.py

(setting num-executors and executor-cores to suit your environment)

and of course... have fun!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
python_scikit_airbnb.ipynb		python_scikit_airbnb.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-sklearn-airbnb-predict

About

Releases

Packages

Languages

Datawiz0/spark-sklearn-airbnb-predict

Folders and files

Latest commit

History

Repository files navigation

spark-sklearn-airbnb-predict

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages