Welcome to Python Programming for Data Science!
This is a first-year course of the MSc in Data Science of the University of Padova. Indeed, it is one of the three modules which the course "Fundamentals of Information Systems" is made of.
This repository contains lecture materials (in the form of Jupyter Notebook and PDF slides) as well as exercises from the 2018-19 examination sessions (with solutions).
The goal of this module is to teach the basics of the Python programming language along with a special focus on Data Science. In particular, students will become familiar with Python packages that are widely used by the community of data scientists and machine learning practicioners, such as numpy
, scipy
, pandas
, seaborn
, and scikit-learn
, just to name a few.
Eventually, at the end of this module students are expected to be able to implement all the stages of a typical machine learning pipeline: from collecting data to building predictive models for solving either a regression or a classification problem.
A full detailed description of the course is available here.
Python Programming for Data Science provides students with the foundational coding skills they need as data scientists.
We start our journey with an exhaustive tutorial on how to properly set up your environment, which is used throughout the class. Essentially, this consists of:
- Installing Python 3.x (we will be using Python 3.6 installed via Anaconda in this class)
- Installing and setting up Jupyter Notebook
Then, we move to discussing the basics of the Python programming language:
- Python object model
- built-in data types
- fuctions
- I/O
Finally, we will dig into a set of the most up-to-date data science Python packages, such as:
numpy
/scipy
(for numerical/scientific computing)pandas
(for data manipulation)matplotlib
/seaborn
(for data visualization)scikit-learn
(for machine learning tasks like regression and classification).
Lecture # | Topics | Class Material |
---|---|---|
Lecture 0 | Preliminary computer science concepts | Notebook, Slides |
Lecture 1 | Introduction and environment setup | Notebook, Slides |
Lecture 2 | Python basics | Notebook, Slides |
Lecture 3 | Python's built-in data types (Part I) | Notebook, Slides |
Lecture 4 | Python's built-in data types (Part II) | Notebook, Slides |
Lecture 5 | Functions & I/O | Notebook, Slides |
Lecture 6 | numpy package |
Notebook, Slides |
Lecture 6b | Review of linear algebra basics | Notebook, Slides |
Lecture 7 | Introduction to pandas package |
Notebook, Slides |
Lecture 8 | I/O with pandas |
Notebook, Slides |
Lecture 9 | Data preparation with pandas |
Notebook, Slides |
Lecture 10 | Data visualization with matplotlib |
Notebook, Slides |
Lecture 11 | A Machine Learning Primer (seminar) | Notebook, Slides |
Lecture 12 | The Regression Problem: Example (Part I) | Notebook |
Lecture 13 | The Regression Problem: Example (Part II) | Notebook |
Lecture 14 | The Classification Problem: Example (Part I) | Notebook |
Lecture 15 | The Classification Problem: Example (Part II) | Notebook |
Lecture 16 | Logistic Regression Demystified (seminar) | Slides |