Skip to content

Latest commit

 

History

History
118 lines (88 loc) · 4 KB

README.md

File metadata and controls

118 lines (88 loc) · 4 KB

dagster_and_r

Exploring the synergy between Dagster, a modern data orchestrator, and R, a powerful statistical programming language. This project showcases how business logic written in R can be integrated seamlessly within the Dagster framework.

Key Features

  • Docker Integration: Execute R code in isolated environments using Docker container ops.
  • Dagster Pipes: Run R scripts within a subprocess, leveraging Dagster's experimental Pipes feature.
  • Reticulate Bridge: Utilize the {reticulate} R package to create a bridge between Python and R, enhancing interoperability.

Getting started

To begin exploring the integration of Dagster and R:

  1. Clone the Repository
    git clone https://github.com/philiporlando/dagster-and-r.git
  2. Navigate to Directory
    cd dagster-and-r
  3. Install Python Dependencies

    you'll need a version of python installed

    Using uv

    install uv

    uv venv source .venv/bin/activate uv sync
  4. ** Install R dependencies**
   # from R
   # if you haven't installed renv before
   # install.packages("renv")
   # renv::restore() 
  1. Set RETICULATE_PYTHON environment variable

Create a new .Renviron file at the root of the project and set the RETICULATE_PYTHON variable to .venv/bin/python, like so:

   #.Renviron
   RETICULATE_PYTHON=.venv/bin/python
  1. Launch the Dagster UI Start the Dagster web server:
    dagster dev
    Access the UI at http://localhost:3000 in your browser.

Dagster UI Never Materialized

  1. Materialize Assets Click the "Materialize all" button in the top right of the UI. Each of the assets within this project should materialize without error.

Dagster UI Materialized

  1. Inspect the Run Click the "Runs" tab and navigate to the latest run of the pipeline to access detailed information, including custom logs, asset checks, and environment variables being passed from an external R session.

Dagster UI Run

  1. Create Assets Begin writing assets in dagster_and_r/assets.py. They are automatically loaded into the Dagster code location.

Then, start the Dagster UI web server:

dagster dev -m dagster_and_r

Open http://localhost:3000 with your browser to see the project.

Current Integrations

Dagster Pipes

  • Pass logs between an external R session and Dagster
  • Pass environment variables and context between an external R session and Dagster
  • Asset checks defined in R
  • In-memory data passing
  • Pass markdown metadata between R and Dagster (e.g. head() of a data.frame))

Docker Container Op

  • Execute external R code from a Docker container op.

Development Guide

Adding Python Dependencies

To add new Python packages to the project:

uv add <pkg-name>

Unit Testing

Unit tests are essential for ensuring code reliability and are currently being developed. Run existing tests using pytest:

pytest dagster_and_r_tests

Note

Unit tests are a work in progress.

Schedules and Sensors

To enable Schedules and Sensors, ensure the Dagster Daemon is active:

dagster dev

With the Daemon running, you can start using schedules and sensors for your jobs.

Contributions

Contributions to enhance or expand the project are welcome! Feel free to fork the repository, make changes, and submit a pull request.