Repo for work in "Data Science and Big Data" with Raja Sooriamurthi.
This course included lectures, homework, and projects covering topics such as:
- Pandas, numpy, seaborn, matplotlib
- Scipy, scrapy, beautifulsoup
- Statistical analysis and data visualization
- Machine learning tools and methods
- Scikit-learn
- cross-validation
- sentiment analysis
- recommenders
- classification, regression, clustering
- Genetic algorithms
- Apache Spark
- MapReduce
- Apache Pig
Projects were self-directed, and were intended to exercise our newfound skills and provide practice at turning data into useful insights. The two main projects were:
- Project 1, about how cost-effective a CMU education is (video here).
- Project 2, which covers a supervised learning classification task to identify failing water pumps in rural Tanzania (video here).