model-report

Overview

The goal of this report is to create a flexible, extensible system for modeling.

Why a report? It can be difficult to share technical modeling results with other data scientists when developing new models. Documentation saved as wiki pages can quickly get out of date and be difficult to update, making it difficult to leave a clear paper trail of the model development lifecycle. This challenge makes collaboration and knowledge sharing a challenge.

Objective: Create a code template meant to easily generate shareable model results via an interactive html report. This template will be heavily parameterized such that different data, models, and inputs can be run with the same steps. This process will hopefully save users time re-writing the same code.

The model will allow users to explore important steps in the modeling process including:

Data/Features: Data origination and standard exploratory data analysis
Modeling setup: Feature engineering, model choice
Tuning and Metrics: Model results, including potential parameter tuning opportunities
Explain: Explainable AI results including variable importance, PDPs, ADP, etc.

How to set up your own report

See Example/ for what an example directory looks like with a random forest model using the iris data.

Create a modeling directory with the following folders: i) /src - customized R code, ii) /reports - where reports get saved), iii) /other_chunks - if you want any additional analysis in the tuning section (optional), iv) /cache - where cached data gets saved (optional)
Update /src R scripts with whatever you want your analysis to be.
1. src/packages.R tells the report additional packages/functions/constants to load
2. src/load_data.R tells the report what data to load
3. src/clean_data.R tells the report how you want to clean data
4. src/features_outcome.R is where we define possible predictor variables + outcome variable
5. src/specs.R is where we define which modeling engines to use (e.g., RF, xgboost, lm)
6. src/recipes.R is where we define any other feature engineering steps as well as the outcome and predict variable(s) for multiple models
Optionally update /other_chunks if you want additional analysis in the tuning section of the report.
Create a set of parameters in a yaml file to tell the Rmarkdown report key information about the model run. Most important in that file is the path value, which should be wherever model directory was created in step 1. See example in Example/example_params.yaml .
After previous steps are complete, you can then run the model by rendering the markdown file given parameter yaml file. See example in Example/run_report.R .
If you are experience any errors, feel free to add a browser() statement in the chunk where the error is occurring to debug. Be sure to remove debugging code before pushing any RMarkdown changes to Master. For more information in RMarkdown, see: https://support.posit.co/hc/en-us/articles/205612627-Debugging-with-RStudio#debugging-in-r-markdown-documents

How does the report come together?

The repo structure contains three key mechanisms that help us generate an extensible report.

General RMarkdown code: model-report.Rmd and all /children/ files help us generate the bones of the html report. These files will stay constant across reports.
Project-specific R code: After the user creates a directory and adds /src folder, they can customize how data is read, cleaned, feature engineered, and modeled. These files will be unique to a given project/report.
Project-specific parameter YAML file: These are model specific specs defining compilation-level information including where to access project-specific R code, stratification variables, specific models to run for that report, etc. These files will be unique to a given project/report.

How to set up the r environment

We will try to standardize our R version and packages to make our results more reproducible across machines.

We do this by taking advantage of two tools in R: 1) R Projects and 2) the renv package for package management (similar to poetry in Python).

Note: while different R versions across machines may work in most cases, getting on the same R version (4.2 in this case) removes one potential culprit for inconsistency across machines.

Install and configure the compiler

One major issue we noticed in setting up a shared R environment was that package installation often failed due to compiler issues. To address these issues, follow the directions below:

Install GCC compiler to get a Fortran compiler. In your terminal, run:

brew install gcc

Make .R dir in your home directory, and .R/Makevars file. In your terminal, run:

cd ~
makedir .R
touch .R/Makevars
open .R/Makevars

Add these lines to your Makevars and save the file. These are taken from these StackOverflow threads: here, here and here.

VER=-13
CC=gcc$(VER)
CXX=g++$(VER)
CFLAGS=-mtune=native -g -O2 -Wall -pedantic -Wconversion
CXXFLAGS=-mtune=native -g -O2 -Wall -pedantic -Wconversion
FLIBS=-L`gfortran -print-file-name=libgfortran.dylib | xargs dirname`
FC=/opt/homebrew/bin/gfortran

Make a .Renviron file in your home directory. In your terminal, run:

touch ~/.Renviron
open ~/.Renviron

Configure your .Renviron file to point to our brew installed compiler. The paths we set here have to match the location of your brew installed compiler. First, get this by running in your terminal:

readlink -f $(brew --prefix gcc)

e.g.

/opt/homebrew/Cellar/gcc/13.2.0

We'll now use this path to set these values in your .Renviron. Add the block below to your .Renviron, and substitute the path you retrieved in the previous step for the {path}/lib and {path}/bin values. Save your file. This is based on this StackOverflow response

LD_LIBRARY_PATH=/opt/homebrew/Cellar/gcc/13.2.0/lib
PATH=/opt/homebrew/Cellar/gcc/13.2.0/bin

Restart Rstudio

Work within the `model_report.Rproj` Project

To ensure that the correct packages can be installed with renv and that you're working in the right environment, make sure to load the model_report.Rproj project file. You can confirm that this file is loaded by looking at the top right of your RStudio and seeing "Project: model_report."

Loading this project will ensure that any time you are working in the repo, RStudio will prioritize the versions of the packages specific to the Lock file associated with the project. In other words, if you have a different version of dplyr than what's in the Lock file, RStudio will prioritize the Lock file version of dplyr when the project is loaded.

Install and run `renv`

Install the latest version of renv. In your R session, run:

install.packages("renv")

Rebuild our environment. In your R session, run:

renv::restore()

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Example		Example
children		children
renv		renv
scripts		scripts
.Rprofile		.Rprofile
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
README.md		README.md
model-report.Rmd		model-report.Rmd
model-report.Rproj		model-report.Rproj
renv.lock		renv.lock
renv_initialization.R		renv_initialization.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

model-report

Overview

How to set up your own report

How does the report come together?

How to set up the r environment

Install and configure the compiler

Work within the `model_report.Rproj` Project

Install and run `renv`

About

Releases

Packages

Languages

ngbasch/model-report

Folders and files

Latest commit

History

Repository files navigation

model-report

Overview

How to set up your own report

How does the report come together?

How to set up the r environment

Install and configure the compiler

Work within the model_report.Rproj Project

Install and run renv

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Work within the `model_report.Rproj` Project

Install and run `renv`

Packages