- Pandas
- Numpy
- Seaborn
- Matplotlib
- Scikit-Learn
This classic project explores the end-to-end process of a machine learning process using the Hands-On Machine Learning book by Aurélien Geron. We perform data analysis, feature engineering, create data pipelines and fit different models to select the best ones. From there we, use ensemble methods and perform evaluations improve accuracy.
The dataset contains housing and demographic information on California districts. The features are a mixture of categorical and numeric values.
The columns are:
- longitude
- latitude
- housing_median_age
- total_rooms
- total_bedrooms
- population
- households
- median_income
- median_house_value
- ocean_proximity
We experiment with Linear Regression, Random Forest, Support Vector Regression(kernel='linear'), and Linear Stochastic Gradient Descent. We fine tune the model using GridSearch and Randomized Search, to find the best combination of features and achieve a 95% accuracy using ensemble methods.