Skip to content

zinebfadili/02807-data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

02807 Computational Tools for Data Science

Repository for the notebooks of the three projects delivered for the 02807 Computational Tools for Data Science @ Technical University of Denmark.

Projects description

Project 1: IMDB Data Analysis

  • Pre-processing data from The Movie Database (TMDb) API
  • Comparing data manipulation & iteration methods in terms of efficiency: itterows, apply, vectorisation with Pandas, vectorization with NumPy
  • Predicting the genre of movies using a Random Forest Classifier
  • Building a movie recommandation system by identifying highly correlated movies

Project 2: AIRBNB & Business Data Analysis

  • Pre-processing data of the listings & the reviews from the Airbnb website
  • Displaying basic data analysis: distribution of prices, trends & reviews for the Airbnb listings
  • Identifying the words that are most positive in Airbnb reviews by building a scoring function
  • Pre-processing data for company X
  • Computing and ploting trends in the behavior of the customers of company X

Project 3: Data streaming

  • Sampling k elements from a stream
  • Merging n reservoir samples
  • Sampling reservoirs from a stream with subgroups
  • Finding the majority element in a stream
  • Finding the majority element in n streams

Contributors

Zineb Fadili (s201501), Lorenzo Beccari (s201809) & Eriks Markevics (s202741)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published