Skip to content

Latest commit

 

History

History
15 lines (9 loc) · 807 Bytes

README.md

File metadata and controls

15 lines (9 loc) · 807 Bytes

consumer_complaints

NLP with various methodologies using consumer_complaints kaggle dataset url: https://www.kaggle.com/datasets/kaggle/us-consumer-finance-complaints

00: EDA with pandas profiling report, data munging, and feature engineering.

00a: Data preparation for validation dataset.

00b: Data preparation for test dataset.

01: NLP with TfidfVectorizer from sklearn, product classification with several classification models (including feature-engineered variables from EDA), hyperparameter optimization with Optuna, and final ensemble model.

02: NLP with Gensim's Doc2Vec embedded vectors and product classification with several classification models.

03: NLP with simpleTransformers multi-classification model and GPU acceleration. LLMs evaluated: bert-base-uncased and bert-large-uncased.