Skip to content

PracticumAI/data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Course Outline

Module 1

Summary

Objectives

By the end of this module, students will be able to:

  1. Recall and define each word of FAIR: findable, accessible, interoperable, and reusable.
  2. Summarize the steps of FAIR data collection, management, and deposition.
  3. Identify the various ways data presents itself and the properties of each type. a. Structured data i. systems (relational databases, blockchain) ii. repetitive (sensor output, metered data) b. Unstructured data i. text (emails, transcriptions, literary texts, social media) ii. non-textual (images, video, audio)
  4. Correctly match foundational data terms to their definitions.
  5. Discuss the relationship between a research question, a dataset, and available AI methods.

To Do List

Read
Watch
Complete

Optional Content / Additional Resources

Data-Centric AI - Andrew Ng
- Resources


Module 2

Summary

Objectives

By the end of this module, students will be able to:

  1. Formulate an aligned data collection strategy to achieve a representative data sample.
  2. Develop search strategies to find datasets linked to published papers or in data repositories.
  3. Generate metadata to accurately describe a dataset thereby making it findable.
  4. Evaluate how others use metadata to describe their datasets.
  5. Summarize the ethical issues of open science - data ownership, profit/data sharing, privacy, etc...

To Do List

Read
Watch
Complete

Optional Content / Additional Resources


Module 3

Summary

Objectives

By the end of this module, students will be able to:

  1. Describe the reasons for train, test, and validation datasets and how to define them in Tensorflow or Pytorch.
  2. Use data augmentation techniques to increase the size of a given dataset and the variety of instances in it.
  3. Create data pipelines and understand the ways in which they can be modified.
  4. Explain the importance of standards and clear data organization.
  5. Organize data in a structured way, using nested folders and consistent naming standards.
  6. Discuss the value of data archiving.

To Do List

Read
Watch
Complete

Optional Content / Additional Resources


Module 4

Summary

Objectives

By the end of this module, students will be able to:

  1. Visualize data using the matplotlib scatterplot, histogram, and boxplot functions.
  2. Identify missing data, outliers, and other anomalies in a dataset.
  3. Modify a dataset by imputing, deleting, or updating values.
  4. Use feature engineering to optimize model training. a. Technique A b. Technique B ...

To Do List

Read
Watch
Complete

Optional Content / Additional Resources

Feature Engineering for Machine Learning by Alice Zheng and Amanda Casari

About

Practicum AI introductory data workshop.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published