Data Analytics And Visualization

Course Page for DS250 (Data Analytics and Visualization) being taught at IIT Bhilai, India in the Winter Semester of 2022.
Course Instructor: Dr. Gagan Raj Gupta

The purpose of this course is to introduce students to various algorithms and techniques for data analysis and visualization. We will emphasize visualization and analysis of complex data sets such as large network data.

Motivation

Useful insights can be obtained from data that can help people: healthcare, industry, governments, science
Getting data is becoming easier day by day, but is very complex and difficult for people to understand
Data has errors of various types (missing, incorrect etc.), is incomplete and is hard to clean (e.g. user reviews/ratings, distorted images)
Data usually has complex correlations and i.i.d. assumptions don't always work very well (e.g. graph data, time-series data)
Data Visualization is critical to help us engage more diverse audience in the process of analytical thinking

In this course, we want to learn how that is being done and solve real-life problems that we are interested in.

Activity Based Learning

UNDERSTAND DATA ---> HYPOTHESIS ---> MODELS ----> INSIGHTS

The Activities will reinforce this theme....will be done in class/lab

Weight distribution: study modality
Rolling dice: central limit theorem
Phone features and cost model
Composition: essay on "My country" -- study the vocabulary, language model, and probabilities. Can a machine compose this essay now?
Measure various quantities: time you sleep, time to eat, time you spend on phone, time you study, play etc. Understand these distributions...
Personality charts: Is there a person like me in the class whom I don't know?
Clusters: how many groups are there in the class, are groups equal? any outliers there?
Curve-fits -- hidden points will appear after you learnt the curve
Guess the number game -- binary search... distribution of number easy or hard?

Course Objectives

Motivate and demonstrate the benefits and uses of data science
Impart the skills needed by a data scientist: acquire, clean, model, visualize data
Teach fundamental algorithms for handling basic and complex datasets including recommendation, basket analysis, streaming algorithms
Study techniques for creating effective visualizations based on principles from graphic design, perceptual psychology, and cognitive science
Teach basic techniques of machine learning (unsupervised) which is important way to model relationships in data
Provide hands-on experience to students in analyzing datasets in diverse fields (NLP, Image/Video, Graphs, Networks, Bio-informatics, Finance)

Pre-requisites

Basic knowledge of Python (most assignments will be based on Python)
Knowledge of basic computer science principles and skills
Math
- Linear Algebra ( Matrix-factorization, Eigenvalues, Column and row spaces, Norms)
- Probability theory (Conditional, Bayes Rule, Concentration Inequalities, Distributions, Gaussian, Multi-variate)
- Basic Data Structures, Algorithms and Asymptotic Analysis (graphs, heaps, lists, dynamic programming)
- Calculus (Multi-variate)
Web Technologies
- HTML
- JS
- CSS

If you don't meet one or more pre-requisites, be prepared to spend more time before or during the course in learning them.

Books

IVB: Interactive visualization for the Web, Scott Murray
PRML: Pattern Recognition and Machine Learning, Christopher Bishop
TDS: Think like a data scientist, Brian Godsey
DSC: Data Science from Scratch, Joel Grus
GA: Graph Algorithms: Practical Examples in Apache Spark and Neo4J, Mark Needham, Amy Hodler
MML: Mathematics of Machine Learning
CIML: A Course in Machine Learning

Class Materials

Google drive (for IIT Bhilai students): GDrive

Detailed Schedule

Legend: Theory Labs

#	Week	Topics planned in this week	Text Book Reference
1	Jan 3	Introduction to Data Science, Facets of Data, Probability Distributions	In D3
1	Jan 3	Lab: Web Development, Intro to D3: Data, Scales, Shapes	W3, Tutorials from UBC 2
2	Jan 10	Data Science Process, Data Models: Geometric view and Statistical view, Data Cleaning, EDA: Value of Visualization, Gaussian Distribution, Conjugate Prior	PRML Ch1, MML Ch 6
2	Jan 10	Lab: Intro to Numpy and Pandas; Intro to Matplotlib, Seaborn, Plotnine,	https://realpython.com/ggplot-python/ https://altair-viz.github.io/
3	Jan 10	Visualizing Data, Kernel Density, Describing Data, Correlation Metrics
3	Jan 10	Lab: Collect Data (Webscraping, API), Analyze and Make a chart in D3, Density Plots	IVB
3	Jan 10	Case Studies: Predicting malicious URLs, Risk in Bank Loans, Sentiment Analysis, Healthcare Dataset
4	Jan 17	Distance Measures, Recommendation Rules, Association Rules, ML intro, Learning by Prototypes
4	Jan 17	Lab: D3 Data joins and basic interactivity
5	Jan 24	Learning to classify data: various algorithms, Generative and Discriminative Models,	PRML
5	Jan 24	Lab: Connect Python to D3 using Flask
6	Jan 31	Mathematics of Neural Networks, Decision Trees	PRML
6	Jan 31	Project: Complete a Basic Data Classification Interactive Application
6	Feb 3	Tierce 1 Exam
7	Feb 7	Networks: Basic Algorithms: Path Finding	GA Ch 4
7	Feb 7	Lab: Visualization of Networks and Trees in D3	IVB Ch13
8	Feb 14	Networks Analysis: Random Walks, Counting Triangles	GA Ch 4
8	Feb 14	Lab: Intro to Neo4J	GA Ch3
8	Feb 14	Case Studies: Knowledge Graphs, Product Recommendation, Graphical Models
9	Feb 21	Centrality Algorithms, Community Detection Algorithms	GA Ch5
9	Feb 21	Lab: Data Analysis using Neo4J
10	Feb 28	Text and Graph Embedding Algorithms	GA Ch6,7
10	Feb 28	Lab: Word Cloud, Text Visualization, NLP libraries: NLTK
11	Mar 7	Graph Machine Learning	GA Ch6,7
11	Mar 7	Project: Complete a graph data analysis Application
12	Mar 12-20	Mid Sem Break
12	Mar 21	Tierce 2 Exam
13	Apr 4	Time Series Data Analysis
13	Apr 4	Lab: Interactive Dashboards	IVB Ch10
14	Apr 11	Streaming Data Algorithms
14	Apr 11	Lab: Time Series Prediction and Clustering
14	Apr 11	Case Studies: Weather, Financial Data, Imports/Exports,
15	Apr 18	Algorithms for Clustering Data: K-Means, K-Means++, DTW, DBSCAN
15	Apr 18	Lab: Geographical Maps	IVB Ch14
15	Apr 25	Dimensionality Reduction Algorithms
16	Apr 25	Lab: Visualization of High-Dimensional Data using T-SNE
17	Apr 25	Major Project: Complete a major data analysis course project

Data Scientist vs. Software Developer (Engineer)

Both data-scientists and software developers are good at designing and building complex systems with many interconnected parts using different tools and frameworks. In general, software developers design systems consisting of many well-defined components, whereas data scientists work with systems wherein at least one of the components isn’t well defined prior to being built. That component is usually closely involved with data processing or analysis. Data scientists specialize in creating systems that rely on probabilistic statements about data and results. Well known examples of these are Google search engine (“These are probably the most relevant pages”), product recommendations on Amazon.com (“We think you’ll probably like these things”).

Skills for a data scientist

We will cover the skills necessary for cleaning, modeling and visualizing data by a data-scientist. We will also learn the skills needed for developers designing interactive dashboards and applications.

D3 (website)

D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.

To make life easy, we will be using the tutorials and notebook environment provided by Observable

Check these examples: Sankey, Cholorpleth, Hexbin,Fisheye

Labs and Projects

The project and lab component of this course will equip students with modern software toolkit to develop their own data analysis and interactive visualization applications (web/android) to better appreciate the data science process.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Homeworks		Homeworks
Tutorial		Tutorial
VizPapers		VizPapers
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Analytics And Visualization

Motivation

Activity Based Learning

UNDERSTAND DATA ---> HYPOTHESIS ---> MODELS ----> INSIGHTS

Course Objectives

Pre-requisites

Books

Class Materials

Detailed Schedule

Data Scientist vs. Software Developer (Engineer)

Skills for a data scientist

D3 (website)

Labs and Projects

About

Releases

Packages

Contributors 2

Languages

License

gagan-iitb/DataAnalyticsAndVisualization

Folders and files

Latest commit

History

Repository files navigation

Data Analytics And Visualization

Motivation

Activity Based Learning

UNDERSTAND DATA ---> HYPOTHESIS ---> MODELS ----> INSIGHTS

Course Objectives

Pre-requisites

Books

Class Materials

Detailed Schedule

Data Scientist vs. Software Developer (Engineer)

Skills for a data scientist

D3 (website)

Labs and Projects

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages