Skip to content

Latest commit

 

History

History
37 lines (24 loc) · 1.45 KB

README.md

File metadata and controls

37 lines (24 loc) · 1.45 KB

California Housing Prediction

Tools used:

  • Pandas
  • Numpy
  • Seaborn
  • Matplotlib
  • Scikit-Learn

This classic project explores the end-to-end process of a machine learning process using the Hands-On Machine Learning book by Aurélien Geron. We perform data analysis, feature engineering, create data pipelines and fit different models to select the best ones. From there we, use ensemble methods and perform evaluations improve accuracy.

The dataset contains housing and demographic information on California districts. The features are a mixture of categorical and numeric values.

image

The columns are:

  • longitude
  • latitude
  • housing_median_age
  • total_rooms
  • total_bedrooms
  • population
  • households
  • median_income
  • median_house_value
  • ocean_proximity

image

image

We experiment with Linear Regression, Random Forest, Support Vector Regression(kernel='linear'), and Linear Stochastic Gradient Descent. We fine tune the model using GridSearch and Randomized Search, to find the best combination of features and achieve a 95% accuracy using ensemble methods.