Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fixed an issue of a solitary NA as var_labels passed to analyze #965

Merged
merged 7 commits into from
Nov 27, 2024

Conversation

kpagacz
Copy link
Contributor

@kpagacz kpagacz commented Nov 22, 2024

Here's an MRE of the issue:

library(rtables)
# devtools::load_all()
lyt <- basic_table() %>%
  split_cols_by("ARM") %>%
  analyze("AGE", var_labels = c(AGE = NA))

which results in:

Error in validObject(.Object) :
invalid class “AnalyzeVarSplit” object: invalid object for slot "split_l
abel" in class "AnalyzeVarSplit": got class "logical", should be or extend
class "character"

R treats a single NA as a logical vector, which fails the validation on the Split object, which requires the labels slot to be a character vector. This issue does not happen when the var_labels array contains other character values. The untyped NAs are cast to characters automatically in this case.

This also does not happen in this scenario:

library(rtables)
# devtools::load_all()
lyt <- basic_table() %>%
  split_cols_by("ARM") %>%
  analyze("AGE", var_labels = c(AGE = NA_character_))

What prompted this is the investigation of what happens when launching the below teal application:

require(teal.modules.general)
require(teal.modules.clinical)
data <- teal_data()
data <- within(data, {
  ADSL <- teal.modules.general::rADSL
  # Remove the label attribute to simulate a dataset without labels
  attributes(ADSL$SEX)$label <- NULL
  ADSL$EOSDY[1] <- NA_integer_
})
join_keys(data) <- default_cdisc_join_keys["ADSL"]

ADSL <- data[["ADSL"]]

app <- init(
  data = data,
  modules = modules(
    teal.modules.clinical::tm_t_summary(
      label = "Demographic Table",
      dataname = "ADSL",
      arm_var = choices_selected(c("ARM", "ARMCD"), "ARM"),
      add_total = TRUE,
      summarize_vars = choices_selected(
        c("SEX", "RACE", "BMRKR2", "EOSDY", "DCSREAS", "AGE"),
        c("SEX")
      ),
      useNA = "ifany"
    )
  )
)
if (interactive()) {
  shinyApp(app$ui, app$server)
}

This application errors out with the above rtables error, because the application tries to evaluate the below:

library(nvimcom)
library(ggplot2)
library(ggmosaic)
library(shiny)
library(teal.code)
library(teal.data)
library(teal.slice)
library(teal)
library(teal.transform)
library(teal.modules.general)
library(formatters)
library(magrittr)
library(rtables)
library(tern)
library(teal.modules.clinical)

ADSL <- teal.modules.general::rADSL
attributes(ADSL$SEX)$label <- NULL
ADSL$EOSDY[1] <- NA

stopifnot(rlang::hash(ADSL) == "fdef4b3486c80ad054476ea2b4131889")

ANL_1 <- ADSL %>% dplyr::select(STUDYID, USUBJID, ARM, SEX)
ANL <- ANL_1
ANL <- ANL %>% teal.data::col_relabel(ARM = "Description of Planned Arm")
ANL_ADSL_1 <- ADSL %>% dplyr::select(STUDYID, USUBJID, ARM)
ANL_ADSL <- ANL_ADSL_1
ANL_ADSL <- ANL_ADSL %>% teal.data::col_relabel(ARM = "Description of Planned Arm")
anl <- ANL %>%
  df_explicit_na(
    omit_columns = setdiff(names(ANL), c(structure(c(SEX = "SEX"), dataname = "ADSL", always_selected = character(0)))),
    na_level = "<Missing>"
  )
anl <- anl %>% dplyr::mutate(ARM = droplevels(ARM))
arm_levels <- levels(anl[["ARM"]])
ANL_ADSL <- ANL_ADSL %>% dplyr::filter(ARM %in% arm_levels)
ANL_ADSL <- ANL_ADSL %>% dplyr::mutate(ARM = droplevels(ARM))
ANL_ADSL <- df_explicit_na(ANL_ADSL, na_level = "<Missing>")
lyt <- rtables::basic_table(
  main_footer = "n represents the number of unique subject IDs such that the variable has non-NA values."
) %>%
  rtables::split_cols_by("ARM", split_fun = drop_split_levels) %>%
  rtables::add_overall_col("All Patients") %>%
  rtables::add_colcounts() %>%
  analyze_vars(
    vars = structure(c(SEX = "SEX"), dataname = "ADSL", always_selected = character(0)),
    var_labels = c(SEX = NA),
    show_labels = "visible",
    na.rm = FALSE,
    na_str = "<Missing>",
    denom = "N_col",
    .stats = c("n", "mean_sd", "mean_ci", "geom_mean", "median", "median_ci", "quantiles", "range", "count_fraction")
  )
result <- rtables::build_table(lyt = lyt, df = anl, alt_counts_df = ANL_ADSL)
result

This fails due to solitary NA in the analyze_vars call, passed directly to rtables::analyze.

Unfortunately, trying to fix the issue on the teal.modules.clinical side is unsuccessful due to how substitute treats such solitary NAs (basically casting them from the explicit NA_character_ to normal NA). The fix could be made on the level of tern, but it seems silly not to go to the bottom, which is rtables::analyze, and deeper - AnalyzeMultiVars, etc...

Let me know if this makes sense for you or if you would like me to take a different route.

Copy link
Contributor

github-actions bot commented Nov 22, 2024

✅ All contributors have signed the CLA
Posted by the CLA Assistant Lite bot.

@kpagacz
Copy link
Contributor Author

kpagacz commented Nov 22, 2024

I have read the CLA Document and I hereby sign the CLA

Copy link
Contributor

@Melkiades Melkiades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kpagacz for this! I think it makes sense to do it with one line at source instead of spreading it around, also considering that NA_character_ is fine. Could you just add a comment and a regression test for this? Also add NEWS, thanks!! ;)

@kpagacz
Copy link
Contributor Author

kpagacz commented Nov 22, 2024

Added the comment, test and the NEWS entry. Ready for re-review.

Copy link
Contributor

@Melkiades Melkiades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kpagacz! Great work

@shajoezhu
Copy link
Collaborator

Hi @kpagacz , thanks a lot for the PR! great change! thanks a lot. no concerns from my side.

I have a question to @pawelru , I was wondering if r-deps allow us to trigger downstream package checks before we merge?
previously, when we were using staged.dependencies, we could trigger downstream package change by the following step, e.g. scda.test

  1. create a branch, and modify the staged.dpendcies.yml package source, e.g. rtables, change to @kpagacz 's fork
  2. raise a pr, and it will test.

i was wondering should we do the same by modifying the github action check.yml to trigger the downstream cicd tests?

@pawelru
Copy link
Contributor

pawelru commented Nov 25, 2024

I have a question to @pawelru , I was wondering if r-deps allow us to trigger downstream package checks before we merge?

Please have a look at the revdepcheck GitHubAction and it's .revdeprefs.yaml config file in particular. This should allow you to run tests on scda.test (and more).

The goal was to allow devs to trigger the runs on request. Currently it's done like this: Actions -> Scheduled -> Run Workflow -> pick branch & pick "revdepcheck".
(Please note that we will be splitting the "scheduled" so in the near future the "revdepcheck" will have it's own category)

I just noticed that for some reason I cannot pick the branch from fork / I cannot trigger the run from the fork. Hence I would suggest to run it interactively or run it in from CMD (Rscript <path to GHA script> --<path to rtables repo> --2 --1200). There is a simple, single script that needs to be run but please don't forget about the parameters (feel free to re-use default values).

@shajoezhu
Copy link
Collaborator

shajoezhu commented Nov 26, 2024

thanks @pawelru , i am experimenting this at https://github.com/insightsengineering/rtables/tree/kpagacz_main

via https://github.com/insightsengineering/rtables/actions/runs/12022176957/job/33513946218 and https://github.com/insightsengineering/rtables/actions/runs/12022262134

@pawelru , the job has failed in https://github.com/insightsengineering/rtables/actions/runs/12022262134/job/33514173793, due to scda.test would require tern to get a newer version, I m going to test another idea.

see, https://github.com/insightsengineering/rtables/actions/runs/12022633473, updated scda.test, remotes to tern@main

hi @kpagacz , sorry to keep this PR open, please let me know if this is a blocker for your work. thanks!

@pawelru
Copy link
Contributor

pawelru commented Nov 26, 2024

I looked into the logs and everything looked good until tern.rbmi. It has ie/tern@*release in Remotes and it messed everything. Please have a look at the order:

Can we remove this from tern.rbmi? Alternatively - please give a try without this package

@kpagacz
Copy link
Contributor Author

kpagacz commented Nov 26, 2024

This is not blocking for anyone on our team because we just work around it by assigning the labels to all the variables we pass to teal. So no rush!

@shajoezhu
Copy link
Collaborator

Hi @kpagacz , thanks a lot for the PR, I have one minor update to your branch, at kpagacz#1, once we merge that in, we can also proceed this one. Thanks a lot!

@kpagacz
Copy link
Contributor Author

kpagacz commented Nov 27, 2024

I merged your PR. Thanks a lot for taking this up!

@shajoezhu shajoezhu enabled auto-merge (squash) November 27, 2024 08:41
Copy link
Collaborator

@shajoezhu shajoezhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thanks a lot @kpagacz

@shajoezhu

This comment was marked as outdated.

@shajoezhu shajoezhu merged commit 51c3e2c into insightsengineering:main Nov 27, 2024
27 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Nov 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants