Project for the Data Mining course @ University of Pisa
Authors: Alessandro Bucci, Alberto Marinelli, Giacomo Cignoni
Data Mining project carried out on two datasets extracted from the Twitter platform, one on users and one for their Tweets. The underlying theme of the tasks was the presence of bots among the crawled users. The project consists of data analysis based on data mining tools divided into four tasks:
Task 1: Data Understanding and Preparation
Task 1.1: Data Understanding, explore the dataset with the analytical tools. Evaluate data quality, distribution of variables and pairwise correlations
Task 1.2: Data Preparation, improve the quality of the data and prepare them by extracting new interesting features to describe the user and their behaviour from the information gathered from tweets
Task 2: Clustering analysis, based on the user’s profile explore the dataset using various clustering techniques
Task 3: Predictive Analysis, consider the problem of predicting for each user the binary label that indicates if a user is a bot or a genuine user
Task 4: Time Series Analysis, conduct an analysis of the time series extracted from tweets of 2019