Skip to content

Hands-on in-person workshop for Data Analysis with Python

License

Notifications You must be signed in to change notification settings

aymanibrahim/dapy

Repository files navigation

Why

For people who struggle to start in data analysis with Python

Description

This hands-on in-person workshop is based on Data Analysis with Python Course by IBM Cognitive Class

Learn how to prepare data for analysis, perform simple statistical analyses, create meaningful data visualizations, predict future trends from data using Jupyter-based environment.

The workshop will cover core topics:

Problem Attributes Types
  • Understanding the Domain
  • Understanding the Dataset
  • Python package for data science
  • Importing and Exporting Data in Python
  • Basic Insights from Datasets
Distribution Bins Histogram
  • Identify and Handle Missing Values
  • Data Formatting
  • Data Normalization Sets
  • Binning
  • Indicator variables
Heatmap Scatterplot Boxplot
  • Descriptive Statistics
  • Basic of Grouping
  • ANOVA
  • Correlation
3rd Polynomial Actual/Fitted 11th Polynomial
  • Simple and Multiple Linear Regression
  • Model Evaluation Using Visualization
  • Polynomial Regression and Pipelines
  • R-squared and MSE for In-Sample Evaluation
  • Prediction and Decision Making
5th Polynomial R^2 4 Features
  • Model Evaluation
  • Over-fitting, Under-fitting and Model Selection
  • Ridge Regression
  • Grid Search

Prerequisite

Pre-workshop

You will need a laptop that can access the internet

1: Installation

Install miniconda or install the (larger) Anaconda distribution

Install Python using Miniconda

OR Install Python using Ananconda

2: Setup

2.1: Download workshop code & materials

Clone the repository

git clone [email protected]:aymanibrahim/dapy.git

OR Download the repository as a .zip file

2.2: Change directory to dapy

Change current directory to dapy directory

cd dapy

2.3: Install Python with required packages

Install Python with the required packages into an environment named dapy as per environment.yml YAML file.

conda env create -f environment.yml

When conda asks if you want to proceed, type "y" and press Enter.

3: Activate environment

Change the current default environment (base) into dapy environment.

conda activate dapy

4: Install & Enable ipywidgets extentions

Enable ipywidgets Jupyter Notebook extension

jupyter contrib nbextension install --user
jupyter nbextension enable --py widgetsnbextension
jupyter nbextension enable python-markdown/main

# Notebooks w/ extensions that auto-run code must be "trusted" to work the first time
jupyter trust ./notebooks/05_Model_Evaluation.ipynb

Install ipywidgets JupyterLab extension

jupyter labextension install @jupyter-widgets/jupyterlab-manager

Enable widgetsnbextension

jupyter nbextension enable --py widgetsnbextension --sys-prefix

5: Check installation

Use check_environment.py script to make sure everything was installed correctly, open a terminal, and change its directory (cd) so that your working directory is the workshop directory dapy you cloned or downloaded. Then enter the following:

python check_environment.py

If everything is OK, you will get the following message:

Your workshop environment is set up

6: Start JupyterLab

Start JupyterLab using:

jupyter lab

JupyterLab will open automatically in your browser.

You may access JupyterLab by entering the notebook server’s URL into the browser.

7: Stop JupyterLab

Press CTRL + C in the terminal to stop JupyterLab.

8: Deactivate environment

Change the current environment (dapy) into the previous environment.

conda deactivate

Workshop Instructor

Ayman Ibrahim

References

Contributing

Thanks for your interest in contributing! There are many ways to contribute to this project. Get started here.

License

Workshop Code

License: MIT

Workshop Materials

Creative Commons License

Data Analysis with Python Workshop by Ayman Ibrahim is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at IBM Cognitive Class Data Analysis with Python by Joseph Santarcangelo, PhD. and Mahdi Noorian, PhD.

About

Hands-on in-person workshop for Data Analysis with Python

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published