Repository for the notebooks of the three projects delivered for the 02807 Computational Tools for Data Science @ Technical University of Denmark.
- Pre-processing data from The Movie Database (TMDb) API
- Comparing data manipulation & iteration methods in terms of efficiency: itterows, apply, vectorisation with Pandas, vectorization with NumPy
- Predicting the genre of movies using a Random Forest Classifier
- Building a movie recommandation system by identifying highly correlated movies
- Pre-processing data of the listings & the reviews from the Airbnb website
- Displaying basic data analysis: distribution of prices, trends & reviews for the Airbnb listings
- Identifying the words that are most positive in Airbnb reviews by building a scoring function
- Pre-processing data for company X
- Computing and ploting trends in the behavior of the customers of company X
- Sampling k elements from a stream
- Merging n reservoir samples
- Sampling reservoirs from a stream with subgroups
- Finding the majority element in a stream
- Finding the majority element in n streams
Zineb Fadili (s201501), Lorenzo Beccari (s201809) & Eriks Markevics (s202741)