This repository contains notebooks and code for three tasks:
- Text classification: given a file with articles' headlines, classify if they are sarcastic or not. Techniques tried:
- TF-IDF and Logistic Regression
- word counts and Naive Bayes
- deep neural network using LSTM and pretrained Glove embeddings
- Tabular data exploration, cleansing and classification:
- analyze corrupted dataset
- perform necessary cleaning
- visualize the data
- apply a classifier
- SQL query: given three tables with students' data, their marks and classes taught at the university, write a simple query to find students that scored well on algebra classes.