-
-
Notifications
You must be signed in to change notification settings - Fork 213
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #717 from tanuj437/main
Used Car Price Prediction
- Loading branch information
Showing
22 changed files
with
54,627 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# Car Price Prediction Dataset | ||
|
||
## 📝 Description | ||
Explore the intricate details of car prices with our comprehensive dataset. This dataset captures various attributes of cars, each labeled with its respective price. By analyzing this dataset, you will gain valuable insights into the factors that influence car prices, aiding in accurate and efficient price prediction. | ||
|
||
## Key Features | ||
- **Diverse Car Attributes:** Understand the impact of various attributes on car prices, including brand, model, model year, mileage, fuel type, engine type, transmission type, exterior color, interior color, accident history, and title status. | ||
- **High-Quality Data:** Each car is represented with detailed attributes, ensuring that all relevant factors are captured for thorough analysis. | ||
- **Balanced Representation:** While some attributes may have more variations than others, the dataset provides a balanced overview of different car features, facilitating effective training and testing of regression models. | ||
|
||
## Data Collection | ||
The data has been meticulously collected and labeled based on actual car listings. This structured approach ensures that each car is accurately described, providing a reliable dataset for training machine learning models. | ||
|
||
## Data Attributes | ||
The dataset contains the following attributes for each car: | ||
|
||
- **id:** Unique identifier for each car | ||
- **brand:** The brand of the car | ||
- **model:** The specific model of the car | ||
- **model_year:** The year the car model was manufactured | ||
- **milage:** The total mileage of the car in kilometers | ||
- **fuel_type:** The type of fuel used by the car (e.g., Petrol, Diesel, Electric) | ||
- **engine:** The engine type of the car (e.g., V6, V8, Electric) | ||
- **transmission:** The type of transmission in the car (e.g., Automatic, Manual) | ||
- **ext_col:** The exterior color of the car | ||
- **int_col:** The interior color of the car | ||
- **accident:** Indicates whether the car has been in an accident (Yes/No) | ||
- **clean_title:** Indicates whether the car has a clean title (Yes/No) | ||
- **price:** The price of the car (target variable) | ||
|
||
## Sample Data | ||
Here are a few sample entries from the dataset: | ||
|
||
| id | brand | model | model_year | milage | fuel_type | engine | transmission | ext_col | int_col | accident | clean_title | price | | ||
|----|--------|-------|------------|--------|-----------|--------|--------------|---------|---------|----------|-------------|-------| | ||
| 1 | Toyota | Camry | 2015 | 60000 | Petrol | V6 | Automatic | Black | Grey | No | Yes | 15000 | | ||
| 2 | Ford | F-150 | 2018 | 40000 | Diesel | V8 | Manual | White | Black | Yes | No | 22000 | | ||
| 3 | Tesla | Model S | 2020 | 20000 | Electric | Electric | Automatic | Red | White | No | Yes | 75000 | | ||
|
||
## How to Use the Dataset | ||
To use this dataset for training machine learning models, follow these steps: | ||
|
||
1. **Download the Dataset:** | ||
The dataset can be downloaded from the relevant directory within this project.[Kaggle](https://www.kaggle.com/datasets/zeeshanlatif/used-car-price-prediction-dataset/data?select=train.csv) | ||
|
||
2. **Data Preprocessing:** | ||
- Convert categorical columns to category dtype. | ||
- Apply standard scaling for numerical features. | ||
- Apply one-hot encoding for categorical features. | ||
- Split the data into training and test sets for model evaluation. | ||
|
||
3. **Model Training:** | ||
Train various machine learning models on the dataset to predict car prices based on the provided attributes. | ||
|
||
## 📢 Conclusion | ||
The car price prediction dataset provides a comprehensive and well-structured collection of car attributes and prices, facilitating the development of accurate and robust prediction models. By leveraging this dataset, you can gain valuable insights into the factors that influence car prices and improve your predictive modeling capabilities. | ||
|
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# Car Price Prediction - Model | ||
|
||
## 📝 Description | ||
This folder contains the pre-trained machine learning models and scripts used for predicting car prices based on various attributes. The aim is to accurately estimate the price of a car given its features. | ||
|
||
## 📂 Contents | ||
- **used-car-price-prediction.ipynb:** Jupyter Notebook containing the complete process of data preprocessing, model training, evaluation, and visualization. | ||
- **README.md:** This document. | ||
- **ridgemodel.pkl:** Pre-trained Ridge Regression model used for car price prediction. | ||
- **preprocessor.pkl:** Pre-trained data preprocessor. | ||
- **unique_values.pkl:** Precomputed unique values for categorical columns. | ||
|
||
## 🎯 Goal | ||
The goal of this car price prediction project is to accurately predict car prices using various machine learning models based on attributes such as brand, model, model year, mileage, fuel type, engine type, transmission type, exterior color, interior color, accident history, and title status. | ||
|
||
## 🧮 What I Did | ||
In this car price prediction project, various models were evaluated to find the most effective one for predicting car prices. The models evaluated include: | ||
|
||
## Models Used | ||
- **Linear Regression:** A basic linear approach to modeling the relationship between the dependent variable and one or more independent variables. | ||
- **Ridge Regression:** A linear regression model with L2 regularization to prevent overfitting. | ||
- **Lasso Regression:** A linear regression model with L1 regularization to perform feature selection. | ||
- **Decision Tree:** A model that splits the data into subsets based on feature values, creating a tree-like structure for regression. | ||
- **Gradient Boosting:** An ensemble learning method that builds models sequentially to correct errors of previous models. | ||
|
||
- **K-Nearest Neighbors Regressor (KNN):** An instance-based learning algorithm that predicts a sample's value based on the average value of its k-nearest neighbors. | ||
- **XGBoost Regressor:** An optimized gradient boosting library designed for speed and performance. | ||
|
||
## Data Preprocessing and Augmentation | ||
- **Image Resizing and Normalization:** Not applicable for this dataset. | ||
- **Feature Engineering:** Applied standard scaling for numerical features and one-hot encoding for categorical features. | ||
- **Data Splitting:** Divided data into training and test sets for robust model evaluation. | ||
|
||
## 🚀 Models Implemented | ||
- **Linear Regression:** Basic linear approach. | ||
- **Ridge Regression:** Regularized linear regression. | ||
- **Lasso Regression:** Regularized linear regression with feature selection. | ||
- **Decision Tree:** Non-linear tree structure. | ||
- **Gradient Boosting:** Sequential ensemble learning. | ||
- **K-Nearest Neighbors Regressor (KNN):** Instance-based learning. | ||
- **XGBoost Regressor:** Optimized gradient boosting. | ||
|
||
## 📈 Performance of the Models | ||
The models were evaluated using mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and R-squared (R2) score. Detailed performance metrics for each model are included in the Jupyter Notebook. | ||
|
||
<img width="146" alt="RMSE_cmp" src="https://github.com/user-attachments/assets/b80bd458-93d5-4108-a447-c9dfa7ef75ee"> | ||
<img width="150" alt="R2_cmp" src="https://github.com/user-attachments/assets/c9d3824f-6c49-437d-9a42-333206f30800"> | ||
<img width="139" alt="MSE_cmp" src="https://github.com/user-attachments/assets/e2dc0348-2baa-490b-be7a-09be09f15ffc"> | ||
<img width="154" alt="MAE_cmp" src="https://github.com/user-attachments/assets/aba3bab7-6000-4ec0-943f-7788d03d1efd"> | ||
|
||
## 📢 Conclusion | ||
The car price prediction project demonstrates that various machine learning models can accurately estimate car prices based on their features. Ridge Regression was chosen as the final model for deployment in the web app. | ||
|
||
## ✒️ Connect with Me | ||
Tanuj Saxena [LinkedIn](https://www.linkedin.com/in/tanuj-saxena-970271252/) |
Binary file not shown.
Binary file not shown.
Binary file not shown.
1 change: 1 addition & 0 deletions
1
Used Car Price Prediction/Model/used-car-price-prediction.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
# Used Car Price Prediction | ||
Explore the world of used car price prediction using various machine learning models to accurately estimate car prices based on their attributes. This project focuses on predicting car prices by analyzing attributes such as brand, model, mileage, fuel type, and more. | ||
<img width="920" alt="webapp1" src="https://github.com/user-attachments/assets/37d15871-c6fe-4042-a360-d10b6b816e15"> | ||
<img width="922" alt="webapp2" src="https://github.com/user-attachments/assets/93e44e0a-3a8d-4041-82c2-7bbf2ee1f85e"> | ||
## 📝 Abstract | ||
The Used Car Price Prediction project aims to estimate the price of used cars based on multiple features. By applying various machine learning models, including regression techniques and ensemble methods, the project seeks to build a robust model for predicting car prices and provide insights into the factors influencing car values. | ||
|
||
## 🔍 Methodology | ||
1. **Importing Libraries** | ||
|
||
Essential libraries such as NumPy, Pandas, Scikit-Learn, and XGBoost are imported for data manipulation, preprocessing, model training, and evaluation. | ||
|
||
2. **Loading the Dataset** | ||
|
||
The dataset contains information about used cars, including features such as brand, model, mileage, fuel type, engine type, transmission type, and more. This dataset is used to train and evaluate the prediction models. | ||
|
||
3. **Data Preprocessing** | ||
|
||
The preprocessing steps include handling missing values, encoding categorical variables, scaling numerical features, and splitting the dataset into training and testing sets. | ||
|
||
4. **Training the Models** | ||
|
||
Multiple models are implemented, including Linear Regression, Ridge Regression, Lasso Regression, Decision Tree Regressor, Random Forest Regressor, Gradient Boosting Regressor, Support Vector Regressor, Extra Trees Regressor, K-Nearest Neighbors Regressor, and XGBoost Regressor. Each model is trained and evaluated to identify the most effective approach for predicting car prices. | ||
|
||
5. **Model Evaluation** | ||
|
||
The performance of each model is evaluated using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R2). Visualization of results helps in comparing the effectiveness of different models. | ||
|
||
6. **Web Application** | ||
|
||
A Streamlit web application is developed to allow users to input car attributes and get predictions on car prices in real-time. This application utilizes the pre-trained model to provide instant price estimates. | ||
|
||
|
||
### 📂 Project Directory Structure | ||
|
||
```bash | ||
Used Car Price Prediction | ||
|- Dataset | ||
|- train.csv | ||
|- README.md | ||
|
||
|- Model | ||
|- used_car_price_prediction.ipynb | ||
|- README.md | ||
|- model.pkl | ||
|- preprocessor.pkl | ||
|- unique.pkl | ||
|
||
|- Web App | ||
|- app.py | ||
|- README.md | ||
|
||
|- Images | ||
|- MAE_cmp.png | ||
|- MSE_cmp.png | ||
|- RMSE_cmp.png | ||
|- R2_cmp.png | ||
|- car_price_distribution.png | ||
|- correlation_matrix.png | ||
|- mileage_vs_price.png | ||
|- README.md | ||
|
||
|- requirements.txt | ||
|- README.md | ||
``` | ||
## How to Use | ||
1. **Install Requirements** | ||
|
||
Ensure you have the necessary libraries and dependencies installed. You can find the list of required packages in the requirements.txt file. | ||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
2. **Download Data** | ||
|
||
Ensure you have the car_prices.csv dataset in the Dataset folder. [Kaggle](https://www.kaggle.com/datasets/zeeshanlatif/used-car-price-prediction-dataset/data?select=train.csv) | ||
|
||
3. **Run the Jupyter Notebook** | ||
|
||
Open the provided Jupyter Notebook file (used_car_price_prediction.ipynb) and run each cell sequentially. Update any file paths or configurations as needed for your environment. | ||
|
||
4. **Training and Evaluation** | ||
|
||
Train the models and evaluate their performance using the provided data. Analyze the results to determine the best-performing model. | ||
|
||
5. **Run the Web Application** | ||
|
||
Navigate to the Web App directory and run the Streamlit application to start predicting car prices using the pre-trained model. | ||
```bash | ||
streamlit run app.py | ||
``` | ||
6. **Interpret Results** | ||
|
||
Use the provided visualizations and metrics to interpret the model’s performance and insights from the data. | ||
|
||
Feel free to reach out if you encounter any issues or need further assistance with running the notebook or web application. | ||
|
||
## Connect with Me | ||
Tanuj Saxena [LinkedIn](https://www.linkedin.com/in/tanuj-saxena-970271252/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# Car Price Prediction Web App | ||
|
||
## Goal 🎯 | ||
This project focuses on predicting car prices based on various attributes such as brand, model, model year, mileage, fuel type, engine type, transmission type, exterior color, interior color, accident history, and title status. The goal is to provide an estimate of a car's price using machine learning models. | ||
|
||
## Model(s) Used for the Web App 🧮 | ||
The model used in this web app is a pre-trained Ridge Regression model, which has been fine-tuned for car price prediction. | ||
|
||
## Video Demonstration | ||
|
||
|
||
|
||
|
||
https://github.com/user-attachments/assets/b672ae69-ee43-43c8-be83-5b6c45558a43 | ||
|
||
|
||
|
||
|
||
## How to Run the Web App | ||
|
||
### Requirements | ||
Ensure you have the necessary libraries and dependencies installed. You can find the list of required packages in the `requirements.txt` file. | ||
|
||
### Installation | ||
1. **Clone the repository:** | ||
```bash | ||
gh repo clone tanuj437/Car-Price-Prediction | ||
cd Car-Price-Prediction | ||
2. **Install the Dependencies** | ||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
3. **Run the Streamlit app** | ||
```bash | ||
streamlit run app.py | ||
``` | ||
### Signature ✒️ | ||
Tanuj Saxena | ||
|
||
[![LinkedIn](https://img.shields.io/badge/LinkedIn-%230077B5.svg?logo=linkedin&logoColor=white)](https://www.linkedin.com/in/tanuj-saxena-970271252/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
import streamlit as st | ||
import pandas as pd | ||
import joblib | ||
|
||
# Load the preprocessor and Ridge model | ||
preprocessor = joblib.load('Model/preprocessor.pkl') | ||
ridge_model = joblib.load('Model/ridgemodel.pkl') | ||
|
||
# Load unique values for categorical columns | ||
unique_values = joblib.load('Model/unique_values.pkl') | ||
|
||
# Define input features | ||
categorical_cols = ['brand', 'model', 'fuel_type', 'engine', 'transmission', 'ext_col', 'int_col', 'accident', 'clean_title'] | ||
numerical_cols = ['model_year', 'milage'] | ||
|
||
# Define the web app | ||
st.title('Car Price Prediction App') | ||
|
||
st.write(""" | ||
## Predict the price of a car based on its attributes | ||
""") | ||
|
||
# Input fields | ||
inputs = {} | ||
for col in numerical_cols: | ||
inputs[col] = st.number_input(f'Enter {col}', min_value=0) | ||
for col in categorical_cols: | ||
options = unique_values[col] | ||
inputs[col] = st.selectbox(f'Select {col}', options=options) | ||
|
||
# When the user clicks the Predict button | ||
if st.button('Predict'): | ||
input_df = pd.DataFrame([inputs]) | ||
|
||
# Apply the transformations to the input data | ||
input_transformed = preprocessor.transform(input_df) | ||
|
||
# Make a prediction | ||
prediction = ridge_model.predict(input_transformed) | ||
|
||
st.write(f'The predicted price of the car is: ${prediction[0]:,.2f}') |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# Images for Car Price Prediction Project | ||
|
||
This folder contains visualizations and plots that illustrate different aspects of the car price prediction project. These images provide insights into model performance, data distribution, and feature relationships. | ||
|
||
## Contents | ||
|
||
### 1. MAE (Mean Absolute Error) | ||
<img width="154" alt="MAE_cmp" src="https://github.com/user-attachments/assets/b8249ac7-2888-42f5-b310-e5698fd06c8b"> | ||
|
||
This plot shows the Mean Absolute Error for various models used in the car price prediction. Lower MAE values indicate better model performance. | ||
|
||
### 2. MSE (Mean Squared Error) | ||
<img width="139" alt="MSE_cmp" src="https://github.com/user-attachments/assets/0cc4f9fa-c0d0-48df-a1d3-db11089525af"> | ||
|
||
This plot represents the Mean Squared Error for the models. It provides insights into the average squared difference between predicted and actual values. | ||
|
||
### 3. RMSE (Root Mean Squared Error) | ||
<img width="146" alt="RMSE_cmp" src="https://github.com/user-attachments/assets/4a5ee698-a234-48a8-b140-ee276c8b509f"> | ||
|
||
The Root Mean Squared Error plot shows the square root of the Mean Squared Error. RMSE is useful for understanding the average magnitude of prediction errors. | ||
|
||
### 4. R2 Score | ||
<img width="150" alt="R2_cmp" src="https://github.com/user-attachments/assets/777e3e75-246c-403f-8004-387918b294bd"> | ||
|
||
This plot displays the R2 Score for the models. The R2 Score indicates the proportion of variance in the dependent variable that is predictable from the independent variables. | ||
|
||
### 5. Car Price Distribution | ||
<img width="590" alt="car_price_distribution" src="https://github.com/user-attachments/assets/d546a94e-f310-407a-b9e7-8c30c7920b39"> | ||
|
||
This visualization shows the distribution of car prices in the dataset, providing an understanding of the spread and central tendency of car prices. | ||
|
||
### 6. Correlation Matrix | ||
<img width="545" alt="correlation_matrix" src="https://github.com/user-attachments/assets/173bd9b7-de3f-4098-99d1-8fc3b8ba164f"> | ||
|
||
The Correlation Matrix plot highlights the correlation coefficients between different features in the dataset, showing how features are related to each other. | ||
|
||
### 7. Mileage vs. Price | ||
<img width="578" alt="milega_vs_price" src="https://github.com/user-attachments/assets/5b338d24-0e2e-4c4a-96b9-81327e5097df"> | ||
|
||
This scatter plot visualizes the relationship between mileage and car price, helping to understand how mileage affects the price of cars. | ||
|
||
## Usage | ||
|
||
These images can be used for presentations, reports, or further analysis to better understand the car price prediction model's performance and the dataset's characteristics. | ||
|
||
## Contributing | ||
|
||
If you have suggestions for additional visualizations or improvements, please feel free to contribute by submitting a pull request. | ||
|
||
## Contact | ||
|
||
For any questions or further information, please reach out to [Tanuj Saxena](https://www.linkedin.com/in/tanuj-saxena-970271252/). | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
pandas==1.5.3 | ||
numpy==1.24.3 | ||
scikit-learn==1.2.2 | ||
matplotlib==3.7.2 | ||
seaborn==0.12.2 | ||
xgboost==1.7.6 | ||
streamlit==1.24.1 | ||
joblib==1.3.2 |