Finally finished, yayayayayayyayaya :)
This is the first wholistic workflow of training a linear regression model, from data exploration, data cleaning and feature engineering, to the final training and evaluation of the performance. Still far from efficient. This is meant to be an exercise and but this is definatly a good enough template for other trainings.
The training check list:
1. Data Exploration:
• Inspecting the dataset.
• Understanding the features and target variable.
• Identifying missing values and outliers.
2. Data Cleaning:
• Handling missing values.
• Removing or imputing outliers.
• Correcting data types.
3. Feature Engineering:
• Creating new features.
• Transforming existing features. (Consider using make_pipeline() to combine transformers to simplify the code)
• Encoding categorical variables.
4. Data Preprocessing:
• Scaling numerical features.
• One-hot encoding categorical features.
• Custom transformations (e.g., calculating property age).
5. Train/Test Split:
• Splitting the data into training and testing sets to ensure the model’s generalizability.
6. Model Training:
• Training a linear regression model.
• Evaluating the model using metrics like RMSE and R² score.
7. Model Evaluation:
• Comparing actual vs. predicted values.
• Analyzing model performance.
Will keep improving the template for it to fit other model training and include more model evaluations in this template!
I can do it!!! :)