Skip to content

Latest commit

 

History

History
703 lines (475 loc) · 38 KB

README.md

File metadata and controls

703 lines (475 loc) · 38 KB

Data-Science-Assignments

Repository for the Data Science learning track to host assignments.

Find powerpoints and helpful resources in the course_material folder of this repo! You'll need to clone the repo to see most of them.


Find a great quick python reference here: https://www.w3schools.com/python/

Homework tips

Think ‘process’ not ‘product’. The goal is to learn. The goal is not to hand in a perfect assignment.

Skim your homework assignment BEFORE you do the readings. It will help focus your attention!

SQR3: Scan, Question, Read, Recall, Review!!!!

Week 1 - Introductions and Python

In Class Assignment due Friday, September 17, 2021 @ 8pm

  1. Finish any installs not completed in class.
  2. Skim the Survival Guide presentation. We will discuss this in more detail throughout the first 8 wks.
  3. Submit the in-class activity to canvas. You can submit a link to your repo or the ipynb file itself.

Homework due Wednesday, September 22, 2021 @ 5:30pm

Readings

  1. Hello World

  2. Data Structures

  3. Intro to Git Please complete 1-2

  4. Intro to Python - Please complete 1-2

  5. Go through the Provided python_click_through.ipynb.

    • Open another notebook and copy each cell and play with it in the new notebook.

    • Ask yourself a question and experiment.

      • What if I change this variable?

        • What’s the outcome?
      • What if I intentionally write code I think will fail?

        • Does it fail?
      • What if I combine the concept in the cell above with this cell?

Notebook

  • Complete the week 1 homework notebook found here. You can also find it in the week 1 folder inside the course materials folder here on the github page.
  • Submit a link to your week 1 homework on Canvas. Week 1 you are allowed to submit the file itself, but in the future you will have to submit a link.

Optional Readings:

String Manipulation

Optional Videos:

https://www.youtube.com/watch?v=YYXdXT2l-Gg&list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7

  • Suggest only Videos 2-7 and 9

Week 2 - Python: Math, Strings, If-Else, Expressions

In Class Assignment due Friday, September 24, 2021 @ 8pm

  1. Read the following: http://swcarpentry.github.io/shell-novice/01-intro/index.html http://swcarpentry.github.io/shell-novice/02-filedir/index.html
  2. Create your own week 02 repository. (If you have not done so in class)
  3. Skim the 'In a nutshell' links for 'Learning how to Learn' and 'Deep Work' in the Survival Guide.
    • Find something in those readings that interests you and explore further.
    • These topics can have profound effects outside the classroom as well.
  4. Submit a link to your group activity on Canvas.

Homework due Wednesday, September 29, 2021 @ 5:30pm

Readings/Videos/DataCamp

  1. Loops

    • In DataCamp, complete Intermediate Python, Chapter 4: Loops (Click here to start)
  2. Functions

  3. Classes

    • Working with classes can be challenging. Focus your attention on:

      1. Creating classes.
      2. Adding attributes.
      3. Creating class methods. (methods that operate on the entire class)
      4. Creating instance methods. (methods that act only on the instance)
      5. Creating objects from classes. (foo = MyClass(attr1, attr2)
    • Focus less (but be aware) of:

      1. Inheritance
    • Read this introduction to classes. (Don't worry about the exercises or any notes about Python 2.7.)

    • Read this and complete the exercise at the end. You do not need to submit these, but they will prepare you for the homework.

    • Read this Python's Methods Demystified

  4. Intro to Git Please complete 3

Notebook

  1. Complete the week_02_homework.ipynb found here. You can also find it in the week 2 folder in the course materials folder at the top of the github page. Submit a link to your repo or submit the .ipynb file.

Optional Videos:

https://www.youtube.com/watch?v=YYXdXT2l-Gg&list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7

  • Only videos 7 & 8

https://www.youtube.com/watch?v=tJxcKyFMTGo&list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7&index=11

https://www.youtube.com/watch?v=ZDa-Z5JzLYM&list=PL-osiE80TeTsqhIuOqKhwlXsIBIdSeYtc

  • Only videos 1,2 & 3

Optional Reading: Only do this if you have completed your homework. And have deleted it and done it again.

introduction to functions. Read more on functions here.

Week 3 - Python: Loops, Functions, Classes

In Class Assignment due Friday, October 1, 2021 @ 8pm

  1. Submit your group activity on Canvas. Make sure it works, first!
  2. OPTIONAL: DataCamp PIP Tutorial

Homework due Wednesday, October 6, 2021 @ 5:30pm

  1. DataCamp: NumPy

    • Intro to Python - Please complete 4
    • Complete the whole chapter: “NumPy” through “Blend it all together”
  2. Cheat Sheets (just for your reference)

  3. Readings (The Unix Shell)

  4. Intro to Git Please complete 4-5

Notebook

  1. Complete the week_03_homework.ipynb here. You can also find the notebook and the csv file you'll need in the week 4 folder in the course materials folder at the top of the page.
  2. Embed screenshots at the end of your jupyter notebook to show you completed the Intro to Git and Intro to Python DataCamp courses.
    • Do your own research on how to do this. There are a couple "correct" ways to embed images in ipynb files
    • Make sure they render when you push your notebook to github
  3. List three things you learned about the unix shell in Markdown below the screenshots. Title the section "The Unix Shell"

Optional Videos:

DataCamp-Numpy

Numpy-Part1

Numpy-Part2

Week 4 - Python: Pandas

In Class Assignment due Friday, October 8, 2021 @ 8pm

  1. Submit your group activity to Canvas
  2. Read, Click Through and Digest: pandas_part_1.ipynb'
  3. Read, Click Through and Digest: pandas_part_2.ipynb'

Homework due Wednesday, October 13, 2021 @ 5:30pm

Readings/Videos/DataCamp

  1. Pandas DataFrames - please read and review as needed
  2. Time Series tutorial with Pandas - please read and review as needed
  3. In DataCamp, Data Manipulation with Pandas - please complete
  4. In DataCamp, Into to DataViz - Matplotlib - Please complete 1-2

Notebook

  1. Complete the week_04_starter.ipynb. You can find it in the week 4 folder in the course materials folder at the top of the github page. Submit a link to your repo.
  • See README.md in the week_04/homework folder for full homework instructions. NOTE: Best viewed in github.
  • Output_examples.ipynb is provided as a reference.
  • View in Jupyter or Github. (Github sometimes mis-formats documents.)
  • NOTE: Your numerical results should be very close to the examples.
    • Your formatting may be very different than provided examples. Focus on getting the data and less on the formatting.
  1. Create a simple graph (any type) using Matplotlib and any of the data in the dataframe. Briefly explain what the graph shows.
  2. Embed an image indicating that you completed Data Manipulation with Pandas from DataCamp

Optional Videos:

Optional Reading:

Week 5 - Python: Plotting (matplotlib and Seaborn)

In Class Assignment due Friday, October 15, 2021 @ 8pm

  1. Submit your group activity to Canvas using the git url!
  2. Read REST API Tutorial if you did not in class or need a refresher

Homework due Wednesday, October 20, 2021 @ 5:30pm

Readings/Videos/DataCamp

  1. In DataCamp, complete the rest of Into to DataViz - Matplotlib

  2. In DataCamp, complete all of Intro to DataViz - Seaborn

Notebook

  1. Complete the WeatherAPI_homework_starter.ipynb. You can find it in the week 5 folder in the course materials folder at the top of the github page. Submit a link to your repo.
  • This homework is likely your first opportunity to build your portfolio.
    • Start early, make it neat.
    • This is a real project you can showcase!
    • API calls can be really slow (it is a free service), so limit the number of calls you are making while testing
  1. Embed screenshots at the end of your jupyter notebook to show you completed the Intro to Data Visualization with Matplotlib and Intro to Data Visualization with Seaborn from DataCamp

Optional Videos:

Optional Reading:

Week 6 - SQL- Part1

In Class Assignment due Friday, October 22, 2021 @ 8pm

  1. Submit your group activity to Canvas using the git url!
  2. Install postgres and pg admin
  3. What is a Database?

Homework due Wednesday, October 27, 2021 @ 5:30pm

Readings/Videos/DataCamp

  1. Overview (Only First page)
  2. RDBMS Concepts (Only First Page)
  3. Intro to SQL - Please complete 1-4
  4. Joining Data - Please complete 1 and 2

Text File (Instead of a Notebook)

  1. Create a file called week_6_hw.sql
  2. Answer all the questions in week_6_sql_hw.docx
  3. For every problem, do the following:
    • Copy and paste the problem into your week_6_hw.sql file.
    • Use PostgreSQL in PGAdmin on your computer to solve the problem.
    • Paste your query into week_6_hw.sql.
    • Write an explanation of what is happening in each query (as a sql comment or in the readme). Be sure to reference the data model in your explanations as needed.
  4. Commit the week_6_hw.sql file to your own repo.
  5. In the readme for the repo explain what an RDBMS is and what SQL is briefly (under 250 words)
  6. Also in the readme, embed a screenshot indicating you have completed the Introduction to SQL in DataCamp
  7. Submit a link to your repo.

Optional Videos:

Week 7 - SQL- Part2

In Class Assignment due Friday, October 29, 2021 @ 8pm

  1. Submit your group activity to Canvas using the git url!
  2. Subquery vs join

Homework due Wednesday, November 3, 2021 @ 5:30pm

Readings/Videos/DataCamp

  1. SQL Autoincrementing
  2. Joining Data - Please complete 3 and 4
  3. Intermediate SQL - Please complete 1-4

Text File (Instead of a Notebook)

  1. Create a file called week_7_hw.sql
  2. Answer all the questions in week_7_sql_hw.docx
  3. For every problem, do the following:
    • Copy and paste the problem into your week_7_hw.sql file.
    • Use PostgreSQL in PGAdmin on your computer to solve the problem.
    • Paste your query into week_7_hw.sql.
    • Write an explanation of what is happening in each query (as a sql comment). Be sure to reference the data model in your explanations as needed.
  4. Commit the week_7_hw.sql file to your own repo.
  5. In the readme, explain what autoincrementing is. Also explain the difference between creating a join and a subquery. This section should be less than 300 words.
  6. Also in the readme, embed a screenshot indicating you have completed the Joining Data in Postgresql DataCamp
  7. Submit a link to your repo.

Optional Videos:

Week 8 - Basic Stats in Python

In Class Assignment due Friday, November 5th, 2021 @ 8pm

  1. Submit your group activity to Canvas using the git url!

Homework due Wednesday, November 10, 2021 @ 5:30pm

Readings/Videos/DataCamp

  1. Intro to Stats in Python - Please complete 1 - 4
  2. Reference scipy documentation

In lieu of a coding assignment, you need to work on your class project and turn it in under week 8 in Canvas.

Please make sure you have a github repo where your team is storing and working on your project together.

We will make sure everyone on the team is contributing evenly to the same repository.

Use what you learned this week in your project!

Week 9 - Basic Probability in Python

In Class Assignment due Friday, November 12th, 2021 @ 8pm

  1. Submit your group activity to Canvas using the git url!

Homework due Wednesday, November 17, 2021 @ 5:30pm

Readings/Videos/DataCamp

  1. Make sure you actually completed the Intro to Stats in DataCamp from last week
  2. Foundations of Probability DataCamp - Please complete 1-2. 3-4 are optional.
  3. Khan Academy: basic theoretical probability section

Notebook

  1. Create a markdown heading and explanation for each question in the probability_hw.docx file under week 9.
  2. Put the code answer for each question under the markdown heading.
  3. Embed a screenshot into your jupyter notebook showing you completed DataCamp's Intro to Stats
  4. Upload your completed notebook to github and submit the link to Canvas.

Optional Reading

  • The Khan Academy: Statistics & Probability course is a great resource to get another video on any concepts that are challenging. Seek out what you need more help on, and your TAs are here to help.

Week 10 - Intro to Linear Algebra in Python

In Class Assignment due Friday, November 19th, 2021 @ 8pm

  1. One hot encoding - This is a tecnique you will use a lot going forward, and it requires knowledge of linear algebra to use in many scenarios
  2. Submit your group activity to Canvas using the git url!

Homework due Wednesday, December 1, 2021 @ 5:30pm

Readings/Videos/DataCamp

  1. Linear Algebra in python - Go through all the subsections from Basics of Linear Algebra to Summary. Do the TRY IT! sections, as they will help you with your homework.
  2. Dot product vs cross product
  3. PCA in python - Remember, PCA uses linear algebra, hence why its relevant here
    • be sure to copy the example code into your own jupyter notebook and run it as you go through the reading.
  4. LaTeX formatting - this can be done in a jupyter notebook markdown section!

Notebook

  1. Create a markdown heading and explanation for each question in the linear_algebra_hw.docx file under week 9.
  2. Put the code answer for each question under the markdown heading.
  3. Upload your completed notebook to github and submit the link to Canvas.

Optional Reading

Week 11 - Time Series Analysis

In Class Assignment due Friday, December 3rd, 2021 @ 8pm

  1. Fourier Analysis
  2. Submit your group activity to Canvas using the git url!

Homework due Wednesday, December 8, 2021 @ 5:30pm

Readings/Videos/DataCamp

  1. Time Series Analysis - Complete 1-4
  2. Fourier Analysis - Reread this to make sure you understand Fourier transform fundamentally
  3. SARIMA in Python

Notebook

  1. Create a markdown heading and explanation for each question in the time_series_hw.docx file under week 10.
  2. Put the code answer for each question under the markdown heading.
  3. Embed a screenshot into your jupyter notebook showing you completed DataCamp's Timeseries Analysis in Python
  4. Upload your completed notebook to github and submit the link to Canvas.

Optional Reading

Week 12 - Intro to Machine Learning

In Class Assignment due Friday, December 10th, 2021 @ 8pm

  1. Lambda, Apply, Assign
  2. Map, Reduce, Lambda
  3. Submit your group activity to Canvas using the git url!

Homework due Wednesday, December 15, 2021 @ 5:30pm

Readings/Videos/DataCamp

  1. Machine learning for business Complete 1-4
  2. Python Data Science Toolbox Part 1 Complete 1-3
  3. Preprocessing for Machine Learning Complete 1
  4. https://datatofish.com/correlation-matrix-pandas/ Correlation matrix

Notebook

  1. Create a markdown heading and explanation for each question in the intro_to_ml.docx file under week 12.
  2. Put the code answer for each question under the markdown heading.
  3. Embed screenshots into your jupyter notebook showing you completed DataCamp's Machine Learning for Business AND Python Data Science Toolbox Part 1
  4. Upload your completed notebook to github and submit the link to Canvas.

Optional Reading

Read about how lambda works https://realpython.com/python-lambda/ More review of linear regression https://www.w3schools.com/python/python_ml_linear_regression.asp (see there is also something on polynomial regression that is useful)

Week 13 - Machine Learning Supervised Learning

In Class Assignment due Friday, December 17th, 2021 @ 8pm

  1. Intro to Supervised Learning in Python
  2. Submit your group activity to Canvas using the git url!

Homework due Wednesday, December 22, 2021 @ 5:30pm

Readings/Videos/DataCamp

  1. Preprocessing for Machine Learning Complete 2-4
  2. Supervised Learning in sklearn Complete 1-4

Notebook

  1. Create a markdown heading and explanation for each question in the supervised_learning.docx file under week 13.
  2. Put the code answer for each question under the markdown heading.
  3. Embed screenshots into your jupyter notebook showing you completed DataCamp's Preprocessing for Machine Learning in Python AND Supervised Learning with Scikit Learn
  4. Upload your completed notebook to github and submit the link to Canvas.

Optional Reading

Google ML Crash Course - up to Regularization

Week 14 - Multiple Linear Regression and Logistic Regression

In Class Assignment due Sunday, December 26th, 2021 @ 8pm

  1. Logistic Regression Read 4.2 on Logistic Regression
  2. Submit your group activity to Canvas using the git url!

Homework due Wednesday, January 5, 2021 @ 5:30pm

Readings/Videos/DataCamp

  1. Intermediate Regression with statsmodel Complete 1-4
  2. Python Data Science Toolbox Part 2 - Please complete 1
  3. Multiple Linear Regression - this is another way to do multiple regression: different from the datacamp course but what we discussed in class. You can use either approach.
  4. Preprocessing Reading - Please read this article to solidify your understanding of preprocessing

Notebook

  1. Create a markdown heading and explanation for each question in the regression_hw.docx file under week 14.
  2. Put the code answer for each question under the markdown heading.
  3. Embed screenshots into your jupyter notebook showing you completed DataCamp's Intermediate Regression with Statsmodel
  4. Upload your completed notebook to github and submit the link to Canvas.

Optional Reading

Intro to Regression with Statsmodel

Some additional statistical concepts https://data-flair.training/blogs/python-statistics

Read section 4.1 on Linear Regression https://christophm.github.io/interpretable-ml-book/limo.html

Read https://www.investopedia.com/terms/m/mlr.asp on multiple linear regression

Week 15 - Support Vector Machines, Oversampling, and Undersampling

In Class Assignment due Friday, January 7th, 2022 @ 8pm

  1. Oversampling and Undersampling
  2. SVM Sklearn Documentation
  3. Submit your group activity to Canvas using the git url!

Homework due Wednesday, January 12, 2022 @ 5:30pm

Readings/Videos/DataCamp

  1. Linear Classifiers in sklearn Please complete 1-3
  2. Python Data Science Toolbox Part 2 - Please complete 2-3

Notebook

  1. You do not have a notebook to complete this week. Your ETL projects are due next week and should be submitted under week 15 on canvas.

Optional Reading

No optional readings this week. Make sure you understand very well what an SVM is and the kinds of problems it can be used to solve!

Week 16 - Decision Trees and ETL Code Reviews

In Class Assignment due Friday, January 14th, 2022 @ 8pm

  1. Decision Trees with python
  2. Submit your group activity to Canvas using the git url!

Homework due Wednesday, January 19, 2022 @ 5:30pm

Readings/Videos/DataCamp

  1. Linear Classifiers in sklearn Please complete 4
  2. Machine Learning with Tree-Based Models Please complete 1-3

Notebook

  1. Create a markdown heading and explanation for each question in the svm_over_under_sampling_hw.docx file under week 16.
  2. Put the code answer for each question under the markdown heading.
  3. Embed screenshots into your jupyter notebook showing you completed DataCamp's Linear Classifiers in Python AND Python Data Science Toolkit Part 2 (from last week)
  4. Upload your completed notebook to github and submit the link to Canvas.

Optional Reading

Gentle Introduction to Information Theory

Information Theory

Decision Tree Classification in Python

Decision Trees for Decision Making

Week 17 - Ensemble Learning and Random Forest

In Class Assignment due Friday, January 21st, 2022 @ 8pm

  1. ROC Curve
  2. Submit your group activity to Canvas using the git url!

Homework due Wednesday, January 26, 2022 @ 5:30pm

Readings/Videos/DataCamp

  1. Machine Learning with Tree-Based Models Please complete 4-5
  2. Parallel Random Forest Paper - You do not need to understand 100% of this, but its important you know what the industry is doing
  3. Intro to XGBoost
  4. Intro to Deep Learning please complete 1-2

Notebook

  1. Create a markdown heading and explanation for each question in the tree_based_models_hw.docx file under week 17.
  2. Put the code answer for each question under the markdown heading.
  3. Embed screenshots into your jupyter notebook showing you completed DataCamp's Machine Learning with Tree-Based Models in Python
  4. Upload your completed notebook to github and submit the link to Canvas.

You should start working on your final project now. At this point, you know enough to start

Week 18 - Intro to Neural Networks

In Class Assignment due Friday, January 28th, 2022 @ 8pm

  1. PyTorch Loss Functions
  2. Submit your group activity to Canvas using the git url!

Homework due Wednesday, February 2, 2022 @ 5:30pm

Readings/Videos/DataCamp

  1. Intro to Deep Learning please complete 3-4
  2. Unsupervised Learning in python Please complete 1-2

Notebook

  1. Create a markdown heading and explanation for each question in the neural_networks_hw.docx file under week 18.
  2. Put the code answer for each question under the markdown heading.
  3. Embed screenshots into your jupyter notebook showing you completed DataCamp's Intro to Deep Learning.
  4. Upload your completed notebook to github and submit the link to Canvas.

Make sure you have picked your group for your final project. No issues if you want to mix them up. Notify the instructor of your project topic and group.

Optional Reading

PyTorch vs Keras vs TensorFlow

Neural Networks

ReLU

Backpropagation

What is TensorFlow

Intro to TensorFlow DataCamp

Week 19 - Unsupervised Learning

In Class Assignment due Friday, February 4th, 2022 @ 8pm

  1. Neural Network Scaling
  2. Submit your group activity to Canvas using the git url!

Homework due Wednesday, February 9, 2022 @ 5:30pm

Readings/Videos/DataCamp

  1. Unsupervised Learning in python Please complete 3-4

Notebook

  1. Create a markdown heading and explanation for each question in the unsupervised_learning_hw.docx file under week 19.
  2. Put the code answer for each question under the markdown heading.
  3. Embed screenshots into your jupyter notebook showing you completed DataCamp's Unsupervised Learning in Python
  4. Upload your completed notebook to github and submit the link to Canvas.

Optional Reading

Unsupervised Learning Cheat Sheet

Hierarchical clustering

K-Means Ideal Number of Clusters

Graph Clustering

Dimensionality Reduction Algorithms

Week 20 - Natural Language Processing

In Class Assignment due Friday, February 11th, 2022 @ 8pm

  1. Tokenization
  2. Submit your group activity to Canvas using the git url!

Homework due Wednesday, February 16, 2022 @ 5:30pm

Readings/Videos/DataCamp

  1. Intro to NLP Complete 1-4
  2. Webscraping in python Complete 1
  3. Create your free Azure account for next class

Notebook/Program

  1. Answer each question in the natural_language_processing_hw.docx file under week 20. Document as needed.
  2. Embed screenshots into your jupyter notebook showing you completed DataCamp's Intro to NLP
  3. Upload your completed repo to github and submit the link to Canvas.

Optional Reading

Top 5 Tokenizers

Stanford NLP Group

NLP for Big Data

MapReduce in Python

Advances in NLP

Week 21 - Intro to Creating APIs in the Cloud

In Class Assignment due Friday, February 18th, 2022 @ 8pm

  1. CRUD vs REST APIs
  2. Submit your group activity to Canvas using the git url!

Homework due Wednesday, February 23, 2022 @ 5:30pm

Readings/Videos/DataCamp

  1. Cloud Computing for Everyone Please complete 1-3. This should take less than 2 hours.
  2. Review Flask documentation as needed to complete HW

Notebook

  1. Create a python application to answer the questions in the APIs_hw.docx file under week 21.
  2. Embed screenshots into your jupyter notebook showing you completed DataCamp's Cloud Computing for Everyone.
  3. Write any conceptual questions, clarifications, and notes in comments. Be sure to number appropriately.

Optional Reading

Flask APIs GETting and POSTing

Deploying Azure App Service

Creating APIs

Flask APIs

Virtual Environments

Week 22 - Webscraping in Python

In Class Assignment due Friday, February 25th, 2022 @ 8pm

  1. Selenium Web Scraping
  2. Submit your group activity to Canvas using the git url!

Homework due Wednesday, March 2nd, 2022 @ 5:30pm

Readings/Videos/DataCamp

  1. Web Scraping in python Please complete 2-4
  2. MongoDB in Python Please complete 1-2
  3. Install MongoDB for next class. https://www.mongodb.com/try/download/community . Instructions To confirm it is working, follow the instructions to run the command line tool, and type in "show dbs" once that is running. If you see some results (usually admin, config, and local), it should be properly installed.
  4. How we learnt to stop worrying and love web scraping

Notebook

  1. Create a python application to answer the questions in the webscraping_hw.docx file under week 22.
  2. Embed screenshots into your readme showing you completed DataCamp's Web Scraping in Python and 1-2 in MongoDB in Python.
  3. Write any conceptual questions, clarifications, and notes in a readme.md file. Be sure to number appropriately.

Optional Reading

Beautiful Soup Documentation

Splinter Documentations

Beautiful Soup

More Beautiful Soup

Even More BS

Week 23 - MongoDB

In Class Assignment due Friday, March 4th, 2022 @ 8pm

  1. NoSQL Explained
  2. Submit your group activity to Canvas using the git url!

Readings/Videos/DataCamp

  1. Delete any running services in your Azure account so you dont incur unwanted charges
  2. MongoDB in Python Please complete 3-4

Notebook

There is no notebook assignment this week. Please complete your final projects with your group and be ready to present next week!

Optional Reading

NoSQL databases

[NoSQL vs SQL](https://www.mongodb.com/nosql-explained/nosql-vs-sql

MongoDB

Week 24 - Final Project Presentations