Skip to content

Morgan243/DSDAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DSDAG - The Data Science DAG DAG

This is a work in progress

Build Status

Goals

  • Capture data processing knowledge in a way that's self-documenting and reusable
  • Reduce the time to implement complex analysis by reusing related prior work
  • Facilitate easier debugging and arbitrary partial execution of complex projects

Why not Luigi, Airflow, ... etc?

  • DSDAG targets interactive environments (jupyter)
    • Other tools focus on integration ETL & data engineering systems
  • DSDAG requires less boilerplate code (but is very similar to Luigi in structure)
  • DSDAG can cache (in memory or on disk) and provide access to Intermediate outputs

Approach

  • The DSDAG framework implements several core components:
    • Ops (or Processes): The smallest unit of compute, similar to a function
    • Parameters: User defined arguments to processes that are tracked
    • DAG: A collection of processes that are needed to compute an output
  • Users define Operations that process inputs or provide some output
  • Operations can specify other operations that they depend on, but these inputs can be overridden at runtime
  • Runtime decisions or parameters (e.g. time of day, policy number, etc.) are implemented as Parameters to the Process
  • Outputs are materialized by building a DAG and running it

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages