Skip to content

Latest commit

 

History

History
28 lines (20 loc) · 676 Bytes

Framework.md

File metadata and controls

28 lines (20 loc) · 676 Bytes

Framework

"Data is the new oil"

Ways to acquire data (typical data source)

  • Download from an internal system
  • Obtained from client, or other 3rd party
  • Extracted from a web-based API
  • Scraped from a website
  • Extracted from a PDF file
  • Gathered manually and recorded

Data Formats

  • Flat files (e.g. csv)
  • Excel files
  • Database (e.g. MySQL)
  • JSON
  • HDFS (Hadoop)

In Search of Data

"Data is an abstraction of the reality."

  • What assumptions have been in this entire data collections process?
  • Are we aware of the assumptions in this process?
  • How to ensure that the data is accurate or representative for the question we are trying to answer?