Getting Started

This repo is a collection of Jupyter Notebooks to accompany the Udacity Connect Intensive Machine Learning Nanodegree. The code is written for Python 2.7, but should be (mostly) compatible with Python 3.x.

Installing Python and Jupyter Notebook

If you haven't already done so, you'll need to download and install Python 2.7. If using Mac OS X, you may want to use Homebrew as a package manager, following these instructions to install Python 2.7 or Python 3. You can also use Anaconda as a package manager. Then, you can follow these instructions to install Jupyter notebook. These instructions explain how to install both Python 2 and Python 3 kernels.

Fork and Clone this Repo

You can follow these instructions to create a fork of the ConnectIntensive repo, and clone it to your local machine. Once you've done so, you can navigate to your local clone of the ConnectIntensive repo and follow these instructions to run the Jupyter Notebook App.

Required Libraries and Packages

The required packages and libraries vary in each of these Jupyter Notebooks. The most commonly used ones are listed below:

Each Lesson Notebook lists its own specific prerequisites along with the objectives.

Lesson Notebooks

Most lesson notebooks have a corresponding solutions notebook with the outputs of each cell shown. For example, the notebook solutions-01.ipynb displays the output and shows the solutions to the exercises from lesson-01.ipynb.

lesson-00.ipynb : Hello Jupyter Notebook!
- A "hello world" notebook to introduce the Jupyter IDE
- Introduces import statements for commonly-used modules and packages
lesson-01.ipynb : An intro to Statistical Analysis using pandas
- Introduces the Series and DataFrame objects in pandas
- Defines categorical variables
- Covers basic descriptive statistics: mean, median, min/max
- Label-based .loc and index-based location .iloc in pandas
- Boolean indexing, how to slice a DataFrame in pandas
- Exercises in exploratory data analysis, emphasizing groupby and plot
lesson-02.ipynb : Working with the Enron Data Set
- Covers the pickle module for saving objects
- Magic commands in Jupyter notebooks
- Use of the stack and unstack functions in pandas
- Exercises in exploratory data analysis on the Enron data set
lesson-03-part-01.ipynb : Building and Evaluating Models with sklearn (part 1)
- Perform exploratory data analysis on a dataset
- Tidy a data set so that it will be compatible with the sklearn library
  - Use the pandas.get_dummies() method to convert categorical variables to dummy or indicator variables.
  - Impute missing values to ensure variables are numeric.
lesson-03-part-02.ipynb : Building and Evaluating Models with sklearn (part 2)
- Make decision tree classifiers on the tidied dataset from part 01
- Compute the accuracy score of a model on both the training and validation (testing) data
- Adjust hyperparameters to see the effects on model accuracy
- Use export_graphviz to visualize decision trees.
- Introduce the Gini impurity
lesson-04-part-01.ipynb : Bayes NLP Mini-Project
- Understand how Bayes' Rule derives from conditional probability
- Write methods, applying Bayesian learning to simple word-prediction tasks
- Practice with python string methods, e.g. str.split(), and python dictionaries
lesson-05.ipynb : Classification with Support Vector Machines
- Introduces additional plotting functionality in matplotlib.pyplot
  - Boxplots for depicting interquartile range (IQR), median, max, min, outliers
  - Scatterplots for 2-D representation of two features.
- Introduction to Support Vector Machines in sklearn
  - An introduction to kernels
  - Hard-margin versus soft-margin SVMs
  - Overview of SVC hyperparameters: C, gamma, degree, etc.
- Visualize decision boundaries resulting from the different kernels
- Practice with the GridSearchCV() method
lesson-06-part-01.ipynb : Clustering Mini-Project
- Perform k-means clustering on the Enron Data Set.
- Visualize different clusters that form before and after feature scaling.
- Plot decision boundaries that arise from k-means clustering using two features.
lesson-06-part-02.ipynb : PCA Mini-Project
- Perform Principal Component Analysis (PCA) on a large set of features.
- Recognize differences between train_test_split() and StratifiedShuffleSplit().
- Introduce the class_weight parameter for SVC().
- Visualize the eigenfaces (orthonormal basis of components) that result from PCA.

Additional Resources

I find that learning Python from Jupyter Notebooks is addictive. Here are some other great resources.

Thomas Corcoran's Connect Repo: More notebooks prepared by another talented MLND Session Lead
Brandon Rhodes' PyCon 2015 Pandas Tutorial: One of my favorite introductions to pandas with an accompanying video lecture.
Jake VanderPlas' Scikit-learn Tutorial: An introduction to sklearn, also with an accompanying video lecture
Kevin Markham's Machine Learning with Text in Scikit-learn Tutorial: If you want to get started with NLP using sklearn, Kevin's tutorial is a great introduction (video lecture here).

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
lesson-03-data		lesson-03-data
20-years-of-games.zip		20-years-of-games.zip
README.md		README.md
lesson-00.ipynb		lesson-00.ipynb
lesson-01.ipynb		lesson-01.ipynb
lesson-02.ipynb		lesson-02.ipynb
lesson-03-part-01.ipynb		lesson-03-part-01.ipynb
lesson-03-part-02.ipynb		lesson-03-part-02.ipynb
lesson-04-part-01.ipynb		lesson-04-part-01.ipynb
lesson-05.ipynb		lesson-05.ipynb
lesson-06-part-01.ipynb		lesson-06-part-01.ipynb
lesson-06-part-02.ipynb		lesson-06-part-02.ipynb
lesson-07.ipynb		lesson-07.ipynb
lesson-08.ipynb		lesson-08.ipynb
lesson-09.ipynb		lesson-09.ipynb
solutions-01.ipynb		solutions-01.ipynb
solutions-02.ipynb		solutions-02.ipynb
solutions-03-part-01.ipynb		solutions-03-part-01.ipynb
solutions-03-part-02.ipynb		solutions-03-part-02.ipynb
solutions-04-part-01.ipynb		solutions-04-part-01.ipynb
solutions-05.ipynb		solutions-05.ipynb
solutions-06-part-01.ipynb		solutions-06-part-01.ipynb
solutions-06-part-02.ipynb		solutions-06-part-02.ipynb
solutions-07.ipynb		solutions-07.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started

Installing Python and Jupyter Notebook

Fork and Clone this Repo

Required Libraries and Packages

Lesson Notebooks

Additional Resources

About

Releases

Packages

Languages

hsaputra/ConnectIntensive

Folders and files

Latest commit

History

Repository files navigation

Getting Started

Installing Python and Jupyter Notebook

Fork and Clone this Repo

Required Libraries and Packages

Lesson Notebooks

Additional Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages