Skip to content

Latest commit

 

History

History
189 lines (150 loc) · 5.37 KB

README.md

File metadata and controls

189 lines (150 loc) · 5.37 KB

image

image


Module 1: Introduction to Pandas

  • Week 1: Understanding Data Science and Pandas

    • What is Data Science?
    • Introduction to Pandas: History and Evolution
    • The Role of Pandas in Data Analysis
  • Week 2: Setting Up the Environment

    • Installing Pandas via Anaconda and pip
    • Overview of Jupyter Notebooks
    • Importing Pandas and Common Libraries
      import pandas as pd

Module 2: Basics of Pandas Series

  • Week 3: Introduction to Pandas Series

    • What is a Pandas Series?
    • Creating a Series from a List
      s = pd.Series([1, 2, 3, 4])
    • Creating a Series from a Dictionary
      s = pd.Series({'a': 1, 'b': 2, 'c': 3})
  • Week 4: Accessing and Modifying Series

    • Accessing Elements by Index
      • Positional Indexing
      • Label Indexing
    • Modifying Series Data
      • Changing Values and Index
      s[0] = 10

Module 3: Series Attributes and Methods

  • Week 5: Understanding Series Attributes

    • Key Attributes: index, values, dtype, name
    • Using len() to Get Series Length
    • Data Types in Series
  • Week 6: Common Series Methods

    • Summary Statistics: .sum(), .mean(), .min(), .max()
    • Using .describe() for Quick Overview
    • String Methods: .str Accessor
      s = pd.Series(['apple', 'banana', 'cherry'])
      s.str.upper()

Module 4: Advanced Series Operations

  • Week 7: Handling Missing Data in Series

    • Identifying Missing Values: isnull(), notnull()
    • Filling Missing Values: fillna()
    • Dropping Missing Values: dropna()
  • Week 8: Vectorized Operations and Functions

    • Applying Functions: apply(), map(), applymap()
    • Common Mathematical Operations
    • Using Numpy Functions with Series

Module 5: Introduction to Pandas DataFrames

  • Week 9: What is a DataFrame?

    • Understanding the Structure of a DataFrame
    • Creating DataFrames from Various Data Structures (Lists, Dictionaries, Numpy Arrays)
      df = pd.DataFrame({
          'A': [1, 2, 3],
          'B': ['a', 'b', 'c']
      })
  • Week 10: DataFrame Attributes and Basic Operations

    • Key Attributes: shape, columns, index, dtypes
    • Basic Operations: .head(), .tail(), .info(), .describe()
    • Accessing and Modifying Columns
      df['C'] = [4, 5, 6]

Module 6: Indexing and Selecting Data in DataFrames

  • Week 11: Basic Indexing and Slicing

    • Selecting Columns and Rows
    • Label-Based Indexing with .loc[]
    • Position-Based Indexing with .iloc[]
  • Week 12: Advanced DataFrame Selection

    • Conditional Selection and Filtering
    • Using Boolean Indexing
    • Chaining Conditions

Module 7: Data Cleaning and Preparation

  • Week 13: Handling Missing Data in DataFrames

    • Identifying Missing Data
    • Imputing and Filling Missing Values
    • Dropping Rows and Columns with Missing Data
  • Week 14: Data Type Conversion and Transformation

    • Converting Data Types: astype()
    • DataFrame Transformation Methods
    • String Manipulations: Using .str Methods in DataFrames

Module 8: Grouping, Aggregating, and Merging DataFrames

  • Week 15: Grouping DataFrames

    • Introduction to groupby()
    • Aggregating Data: Mean, Sum, Count
    • Custom Aggregation Functions
  • Week 16: Merging and Joining DataFrames

    • Different Types of Joins: Inner, Outer, Left, Right
    • Using merge() and join() Methods
    • Concatenating DataFrames: concat()

Module 9: Time Series in DataFrames

  • Week 17: Time Series Basics

    • Creating DateTime Index
    • Resampling Time Series Data
    • Handling Time Zones
  • Week 18: Time Series Operations

    • Rolling and Expanding Windows
    • Time Series Visualization Techniques
    • Date Arithmetic

Module 10: Data Visualization with Pandas

  • Week 19: Introduction to Visualization

    • Basic Plotting with Pandas
    • Customizing Plots: Titles, Labels, Legends
    • Plot Types: Line, Bar, Histogram, Boxplot
  • Week 20: Advanced Visualization Techniques

    • Creating Subplots and Multiple Plots
    • Integrating with Matplotlib for Custom Visuals
    • Saving and Exporting Plots

Module 11: Best Practices and Optimization

  • Week 21: Performance Optimization Techniques

    • Profiling DataFrames: Memory Usage
    • Using Vectorization for Speed
    • Best Practices for Handling Large Datasets
  • Week 22: Writing Clean and Maintainable Code

    • Code Readability and Documentation
    • Error Handling Techniques
    • Developing Reusable Functions

Supplementary Resources

  • Recommended Books: Python for Data Analysis by Wes McKinney, Pandas Cookbook by Theodore Petrou
  • Online Courses: DataCamp, Coursera, edX
  • Practice Datasets: Kaggle, Open Data Sources

Final Assessment

  • Comprehensive Quiz on Series and DataFrame Concepts
  • Hands-on Project: Data Analysis using Series and DataFrames
  • Peer Review and Feedback Session

Special Thanks to the Contributors