Text multilabel classification using BERT, word2vec, xgboost
We follow the next steps:
- EDA
- Data preprocessing
- Xgboost+word2vec+tf-idf Modeling
- BERT pretrained model
1.https://huggingface.co/datasets/onestop_english
OneStopEnglish is a corpus of texts written at three reading levels, and demonstrates its usefulness for through two applications - automatic readability assessment and automatic text simplification.