The goal of this project is to develop a predictive model that can estimate the probability of a customer making their first purchase in the Beauty vertical. Machine learning and data processing techniques will be used to create an accurate and robust model that can help identify conversion opportunities and improve marketing and sales strategies.
In the development of this project, the following methods and techniques were employed:
- Feature Selection with XGBoost: The XGBoost algorithm was applied to select the most relevant features that influence the probability of purchase in the Beauty vertical. This helped to reduce the dimensionality of the data and improve model efficiency.
- Encoding: Encoding techniques were implemented to convert categorical variables into numerical representations that could be used by machine learning models. Encodings such as One-Hot Encoding and Frequency Encoding were employed to effectively handle categorical variables.
- Imputing: Mean imputation and categorical imputation techniques were applied to handle missing values in the dataset, ensuring that the models could be trained on complete data.
Several machine learning models were evaluated and compared to predict the probability of purchase in the Beauty vertical. The models used include:
- LightGBM
- Logistic Regression
- Random Forest
- XGBoost
meli_challenge/
├── conf/
│ ├── data/
│ ├── feature_selectio/
│ ├── models/
│ ├── preprocess/
│ ├── save_selected_columns/
│
├── data/
│
├── notebooks/
│
├── src/
│ ├── data/
│ ├── models/
│ ├── preprocess/
│ ├── utils/
│ ├── pipeline/
│
├── utils/
│
├── README.md
- Clone this repository to your local machine.
- Install the necessary dependencies using
pip install -r requirements.txt
- Adjust the model configuration in the /conf directory as needed.
- Multirun with hydra
python main.py -m +models=lgbm_clas,xgb_clas,rad_for,log_reg +preprocess/encoding=freq_encoder,one_hot_encoder
- Before executing the notebook, make sure to run Step 4 to train the models with the best hyperparameters.