Refactor expand_model_out_grid for readability #147

zkamvar · 2024-10-30T15:18:36Z

I've found myself digging into the expand_model_out_grid() function and getting stuck on more than one occasion. The places I have found myself stuck in are nested mapping function calls combined with pipe operators without comments. I don't have a problem with pipes or nesting per se, but these fill up my working memory and if I'm trying to debug something like #123, it takes me a long time to get to the correct conclusion.

This is an important function, and it's done a lot of heavy lifting. Before we move over to schemas version 4.0, I would propose to give it a polish and comments so that it becomes easier to navigate.

For example, two multi-line statements were required for expand_output_type_grid(), but they took me a good amount of time to figure out what was happening in each. Instead, they could be stand-alone functions called get_task_id_list() and get_output_type_list() with comments about their operations:

hubValidations/R/expand_model_out_grid.R

Lines 178 to 190 in 5c83952

    
           task_id_l <- purrr::map( 
        
             round_config[["model_tasks"]], 
        
             ~ .x[["task_ids"]] %>% 
        
               derived_taskids_to_na(derived_task_ids) %>% 
        
               null_taskids_to_na() 
        
           ) %>% 
        
             # Fix round_id value to current round_id in round_id variable column 
        
             fix_round_id( 
        
               round_id = round_id, 
        
               round_config = round_config, 
        
               round_ids = hubUtils::get_round_ids(config_tasks) 
        
             ) %>% 
        
             process_grid_inputs(required_vals_only = required_vals_only)

hubValidations/R/expand_model_out_grid.R

Lines 197 to 215 in 5c83952

    
           output_type_l <- purrr::map( 
        
             round_config[["model_tasks"]], 
        
             function(.x) { 
        
               out <- .x[["output_type"]] 
        
               if (is.null(output_types)) { 
        
                 out 
        
               } else { 
        
                 mt_output_types <- output_types[output_types %in% names(out)] 
        
                 out[mt_output_types] 
        
               } 
        
             } 
        
           ) %>% 
        
             purrr::map( 
        
               ~ extract_mt_output_type_ids(.x, config_tid) 
        
             ) %>% 
        
             process_grid_inputs(required_vals_only = required_vals_only) %>% 
        
             purrr::map(function(.x) { 
        
               purrr::compact(.x) 
        
             })

The helper functions themselves could be re-factored as well with comments indicating what's happening. Continuing the thread, when I encountered expand_output_type_grid(), I was already pretty deep, but I was not really understanding what this was doing because it had pipe operators nested within a mapping function that was further piped to another function.

hubValidations/R/expand_model_out_grid.R

Lines 278 to 288 in 5c83952

    
             purrr::imap( 
        
               output_type_values, 
        
               ~ c(task_id_values, list( 
        
                 output_type = .y, 
        
                 output_type_id = .x 
        
               )) %>% 
        
                 purrr::compact() %>% 
        
                 expand.grid(stringsAsFactors = FALSE) 
        
             ) %>% 
        
               purrr::list_rbind() 
        
           }

If I were to refactor this for readability, I would do something like:

  # generate expanded grids for each output type with the task IDs 
  output_type_grid_list <- purrr::imap(output_type_values, output_type_grid, task_id_values = task_id_values)
  # combine them into a single grid
  purrr::list_rbind(output_type_grid_list)
}

#' Create a grid for a single output type
#'
#' @param output_type a character indicating the output type
#' @param output_type_id a character vector of the corresponding output_type_id or NULL
#' @param task_id_values a list of task IDs
#' @noRd
output_type_grid <- function(output_type, output_type_id, task_id_values) {
  # generate a list, remove the NULL elements, and then use `expand.grid()` to create
  # a table with all combinations of of the output type IDs and task IDs  
  c(task_id_values, list(output_type = output_type, output_type_id = output_type_id)) %>%
    purrr::compact() %>%
    expand.grid(stringsAsFactors = FALSE)
}

Again, I think this is an important function and it has had a lot of hard work put into it by @annakrystalli to cover all the weirdness that arises from the schemas and backwards compatibility.

The text was updated successfully, but these errors were encountered:

… Related to #147

github-project-automation bot moved this to Todo in hubverse Development overview Oct 30, 2024

github-project-automation bot added this to hubverse Development overview Oct 30, 2024

zkamvar added the upkeep maintenance, infrastructure, and similar label Oct 30, 2024

annakrystalli moved this from Todo to In Progress in hubverse Development overview Nov 12, 2024

annakrystalli self-assigned this Nov 12, 2024

annakrystalli added a commit that referenced this issue Nov 12, 2024

Add support for v4 NULL point estimate output_type_ids. Resolves #156.…

8b5da6b

… Related to #147

annakrystalli mentioned this issue Nov 13, 2024

Support v4 point estimate required null values #160

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor expand_model_out_grid for readability #147

Refactor expand_model_out_grid for readability #147

zkamvar commented Oct 30, 2024 •

edited

Loading

Refactor expand_model_out_grid for readability #147

Refactor expand_model_out_grid for readability #147

Comments

zkamvar commented Oct 30, 2024 • edited Loading

zkamvar commented Oct 30, 2024 •

edited

Loading