You can Launch this tutorial on or .
The slide-show will work on binder
, but not colab
. Press Alt+r
to toggle slide-show.
You need to have Python installed with numpy, matplotlib and pandas libraries. https://www.anaconda.com/ Python distribution is easy to install and bundles common libraries for scientific computing.
- Click the
Clone or download
button. - Select
Download ZIP
. - Unzip the downloaded zip file into a folder.
- Open a terminal (command prompt/anaconda prompt on Windows) and go to this folder by entering
cd folder-path
command). - Start jupyter notebook (enter
jupyter notebook
). This will open a browser window showing the contents of the folder. - Click the
Wisconsin_breast_cancer_data.ipynb
file and this will open the notebook in a new window or tab of the browser. - In order to run it as slide-show you must install https://rise.readthedocs.io module first. Press
Alt+r
to toggle slide-show.
This tutorial explores basic usage of numpy and matplotlib using this publicly available dataset: http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin.
- Introduction to jupyter including running it on the cloud.
- Import python modules.
numpy
basics- array creation, arithmetic, indexing, slicing, reshaping.
- heterogeneous arrays (structured/record arrays with named fields).
- load data from csv (text) file.
- (advanced) use
requests
library to retrieve data from the Internet. - (advanced) use StringIO to read from string containing the csv data.
- (basic) specify field names and data types using numpy
dtype
. - (basic) specify missing value handling when loading data
- (basic) check columns for missing values.
- (advanced) use
- Pandas
- (basic) load data using Pandas.
- (basic) accessing rows and columns in Pandas dataframe.
- (basic) do simple boxplots of dataframe columns. -matplotlib
- basic plotting
- line plot
- scatter plot
- histogram
- box plot
- legend
- subplots
- list of other useful modules
It was originally the first part of a two part workshop on the basics of numpy
, matplotlib
and pandas
for data science. The first version used in the live workshop can be downloaded as an archive here: https://github.com/subhacom/np_tut_breastcancer/releases/tag/v0.1. This notebook may be updated occasionally.
The second part of the workshop focuses on pandas
for data science and is available here: https://github.com/bballew/pandas_tutorial.