diff --git a/_pkgdown.yml b/_pkgdown.yml index 5ed84d73c..b87bd7d83 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -21,9 +21,11 @@ navbar: href: articles/dev-guide/dg_split_machinery.html - text: Tabulation href: articles/dev-guide/dg_tabulation.html + - text: Table Hierarchy + href: articles/dev-guide/dg_table_hierarchy.html - text: Debugging in {rtables} and Beyond href: articles/dev-guide/dg_debug_rtables.html - - text: Sparse notes on {rtables} internals + - text: Sparse Notes on {rtables} Internals href: articles/dev-guide/dg_notes.html reports: text: Reports @@ -87,11 +89,16 @@ articles: contents: - manual_table_construction - tabulation_dplyr + + - title: Developer Guide + desc: Articles intended for developer use only. + contents: # *REF1* Dev Guide items - - 'dev-guide/dg_split_machinery' - - 'dev-guide/dg_tabulation' - - 'dev-guide/dg_debug_rtables' - - 'dev-guide/dg_notes' + - dev-guide/dg_split_machinery + - dev-guide/dg_tabulation + - dev-guide/dg_table_hierarchy + - dev-guide/dg_debug_rtables + - dev-guide/dg_notes reference: - title: Argument Conventions diff --git a/vignettes/advanced_usage.Rmd b/vignettes/advanced_usage.Rmd index a535b81ce..f7f4344ac 100644 --- a/vignettes/advanced_usage.Rmd +++ b/vignettes/advanced_usage.Rmd @@ -1,10 +1,10 @@ --- -title: "rtables Advanced Usage" +title: "{rtables} Advanced Usage" author: "Gabriel Becker" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > - %\VignetteIndexEntry{rtables Advanced Usage} + %\VignetteIndexEntry{{rtables} Advanced Usage} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: diff --git a/vignettes/dev-guide/dg_debug_rtables.Rmd b/vignettes/dev-guide/dg_debug_rtables.Rmd index 773b4c672..7e1ec5374 100644 --- a/vignettes/dev-guide/dg_debug_rtables.Rmd +++ b/vignettes/dev-guide/dg_debug_rtables.Rmd @@ -1,10 +1,8 @@ --- -title: "Debugging in `rtables` and Beyond" +title: "Debugging in {rtables} and Beyond" author: "Davide Garolini" date: '`r Sys.Date()`' -output: - html_document: - theme: spacelab +output: html_document editor_options: chunk_output_type: console --- diff --git a/vignettes/dev-guide/dg_notes.Rmd b/vignettes/dev-guide/dg_notes.Rmd index f162bb7a0..e35cd703f 100644 --- a/vignettes/dev-guide/dg_notes.Rmd +++ b/vignettes/dev-guide/dg_notes.Rmd @@ -1,5 +1,5 @@ --- -title: "Sparse notes on {rtables} internals" +title: "Sparse Notes on {rtables} Internals" author: "Davide Garolini" date: '`r Sys.Date()`' output: diff --git a/vignettes/dev-guide/dg_split_machinery.Rmd b/vignettes/dev-guide/dg_split_machinery.Rmd index 5f323c4c7..789443462 100644 --- a/vignettes/dev-guide/dg_split_machinery.Rmd +++ b/vignettes/dev-guide/dg_split_machinery.Rmd @@ -2,12 +2,7 @@ title: "Split Machinery" author: "Davide Garolini" date: '`r Sys.Date()`' -output: - html_document: - theme: spacelab - toc: true - toc_float: - collapsed: false +output: html_document editor_options: chunk_output_type: console --- @@ -34,7 +29,7 @@ The following article will describe how the split machinery works in the row dom ## Process and Methods -Beforehand, we encourage the reader to familiarize themselves with the Debugging in `rtables`(xxx link here) article from the `rtables` Developers Guide. This document is generally valid for R programming, but has been tailored to study and understand complex packages that rely heavily on S3 and S4 object programming like `rtables`. +Beforehand, we encourage the reader to familiarize themselves with the [Debugging in {rtables} article](https://insightsengineering.github.io/rtables/main/articles/dev-guide/dg_debug_rtables.html) from the `rtables` Developers Guide. This document is generally valid for R programming, but has been tailored to study and understand complex packages that rely heavily on S3 and S4 object programming like `rtables`. Here, we explore and study the split machinery with a growing amount of complexity, following relevant functions and methods throughout their execution. By going from basic to complex and by discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works. @@ -133,7 +128,7 @@ We will see where and how input parameters are used. The most important paramete We will start by looking at the first function called from `do_split`. This will give us a good overview of how the split itself is defined. This function is, of course, the check function (`check_validsplit`) that is used to verify if the split is valid for the data. In the following we will describe the split-class hierarchy step-by-step, but we invite the reader to explore this further on their own as well. -Let's first search the package for `check_validsplit`. You will find that it is defined as a generic in `R/split_funs.R`, where it is applied to the following "split" classes: `VarLevelSplit`, `MultiVarSplit`, `VAnalyzeSplit`, `CompoundSplit`, and `Split`. Another way to find this information, which is more useful for more spread out and complicated objects, is by using `showMethods(check_validsplit)`. The virtual class `VAnalyzeSplit` (by convention virtual classes start with "V") defines the main parent of the analysis split which we discuss in detail in the related vignette `vignette()` (xxx). From this, we can see that the `analyze()` calls actually mimic split objects as they create different results under a specific final split (or node). Now, notice that `check_validsplit` is also called in another location, the main `R/tt_dotabulation.R` source file. This is again something related to making "analyze" rows as it mainly checks for `VAnalyzeSplit` (link to tabulation dev guide xxx). We will discuss the other classes as they appear in our examples (link to class hierarchy xxx). +Let's first search the package for `check_validsplit`. You will find that it is defined as a generic in `R/split_funs.R`, where it is applied to the following "split" classes: `VarLevelSplit`, `MultiVarSplit`, `VAnalyzeSplit`, `CompoundSplit`, and `Split`. Another way to find this information, which is more useful for more spread out and complicated objects, is by using `showMethods(check_validsplit)`. The virtual class `VAnalyzeSplit` (by convention virtual classes start with "V") defines the main parent of the analysis split which we discuss in detail in the related vignette `vignette()` (xxx). From this, we can see that the `analyze()` calls actually mimic split objects as they create different results under a specific final split (or node). Now, notice that `check_validsplit` is also called in another location, the main `R/tt_dotabulation.R` source file. This is again something related to making "analyze" rows as it mainly checks for `VAnalyzeSplit`. See the [Tabulation article](https://insightsengineering.github.io/rtables/main/articles/dev-guide/dg_tabulation.html) for more details. We will discuss the other classes as they appear in our examples. See more about class hierarchy in the [Table Hierarchy article](https://insightsengineering.github.io/rtables/main/articles/dev-guide/dg_table_hierarchy.html). For the moment, we see with `class(spl)` (from the main `do_split` function) that we are dealing with an `AllSplit` object. By calling `showMethods(check_validsplit)` we produce the following: @@ -197,7 +192,7 @@ AllSplit <- function(split_label = "", } ``` -We can also print this information by calling `getClass("AllSplit")` for the general slot definition, or by calling `getClass(spl)`. Note that the first call will give also a lot of information about the class hierarchy. For more information regarding class hierarchy, please refer to the relevant article (xxx). We will discuss the majority of the slots by the end of this document. Now, let's see if we can find some of the values described in the constructor within our object. To do so, we will show the more compact representation given by `str`. When there are multiple and hierarchical slots that contain objects themselves, calling `str` will be much less or not at all informative if the maximum level of nesting is not set (e.g. `max.level = 2`). +We can also print this information by calling `getClass("AllSplit")` for the general slot definition, or by calling `getClass(spl)`. Note that the first call will give also a lot of information about the class hierarchy. For more information regarding class hierarchy, please refer to the relevant article [here](https://insightsengineering.github.io/rtables/main/articles/dev-guide/dg_talbe_hierarchy.html). We will discuss the majority of the slots by the end of this document. Now, let's see if we can find some of the values described in the constructor within our object. To do so, we will show the more compact representation given by `str`. When there are multiple and hierarchical slots that contain objects themselves, calling `str` will be much less or not at all informative if the maximum level of nesting is not set (e.g. `max.level = 2`). ```{r, eval=FALSE} # rtables 0.6.2 diff --git a/vignettes/dev-guide/dg_table_hierarchy.Rmd b/vignettes/dev-guide/dg_table_hierarchy.Rmd new file mode 100644 index 000000000..5fb65eaaa --- /dev/null +++ b/vignettes/dev-guide/dg_table_hierarchy.Rmd @@ -0,0 +1,95 @@ +--- +title: "Table Hierarchy" +author: "Abinaya Yogasekaram" +date: "`r Sys.Date()`" +output: html_document +editor_options: + chunk_output_type: console +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +``` + +## Disclaimer + +This article is intended for use by developers only and will contain low-level explanations of the topics covered. For user-friendly vignettes, please see the [Articles](https://insightsengineering.github.io/rtables/main/articles/index.html) page on the `rtables` website. + +Any code or prose which appears in the version of this article on the `main` branch of the repository may reflect a specific state of things that can be more or less recent. This guide describes very important aspects of table hierarchy that are unlikely to change. Regardless, we invite the reader to keep in mind that the current repository code may have drifted from the following material in this document, and it is always the best practice to read the code directly on `main`. + +Please keep in mind that `rtables` is still under active development, and it has seen the efforts of multiple contributors across different years. Therefore, there may be legacy mechanisms and ongoing transformations that could look different in the future. + +## Introduction + +The scope of this vignette is to understand the structure of rtable objects, class hierarchy with an exploration of tree structures as S4 objects. Exploring table structure enables a better understanding of rtables concepts such as split machinery, tabulation, pagination and export. More details from the user's perspective of table structure can be found in the relevant vignettes. + +isS4 +getclass - for class structure + + +## Process and Methods + +We invite developers to use the provided examples to interactively explore the rtables hierarchy. The most helpful command is 'getClass' for a list of the slots associated with a class, in addition to related classes and their relative distances. + +## Representation of Information before generation + + +## Table Representation +"PredataAxisLayout" class is used to define the data subset instructions for tabulation. 2 subclasses (one for each axis): PredataColLayout, PredataRowLayout + +## Slots, Parent-Child Relationships + +## Content (summary row groups) + +Splits are core functionality for rtables as tabulation and calculations are often required on subsets of the data. + +## Split Machinery +```{r, message=FALSE} +library(rtables) +getClass("TreePos") +``` + +"TreePos" class contains split information as a list of the splits, split label values, and the subsets of the data that are generated by the split. + +AllSplit +RootSplit +MultiVarSplit +VarStaticCutSplit +CumulativeCutSplit +VarDynCutSplit +CompoundSplit +VarLevWBaselineSplit + + +The highest level of the table hierarchy belong to "TableTree". The code below identifies the slots associated with with this class. +```{r} +getClass("TableTree") +``` + +As an S4 object, the slots can be accessed using "@" (similar to the use of "$" for list objects). +You'll notice there are classes that fall under "Extends". The classes contained here have a relationship to the TableTree object and are "virtual" classes. To avoid the repetition of slots and carrying the same data (set of slots for example) that multiple classes may need, rtables extensively uses virtual classes. A virtual class cannot be instantiated, the purpose is for other classes to inherit information from it. + + +```{r} + +lyt <- basic_table(title = "big title") %>% + split_rows_by("SEX", page_by = TRUE) %>% + analyze("AGE") + +tt <- build_table(lyt, DM) + +# Though we don't recommend using str for studying rtable objects, +# we do find it useful in this instance to visualize the parent/child relationships. +str(tt, max.level=2) +``` + +## Tree Paths + +Root to Leaves, are vectors of vectors +Tables are tree, nodes in the tree can have summaries associated with them. Tables are trees because of the nested structure. There is also the benefit of keeping and repeating necessary information when trying to paginate a table. + +Children of ElementaryTables are row objects. TableTree can have children that are either row objects or other table objects. + + +#### TODO: +Create Tree Diagram showing class hierarchy. diff --git a/vignettes/dev-guide/dg_tabulation.Rmd b/vignettes/dev-guide/dg_tabulation.Rmd index 6784b2ff6..9148134cd 100644 --- a/vignettes/dev-guide/dg_tabulation.Rmd +++ b/vignettes/dev-guide/dg_tabulation.Rmd @@ -2,12 +2,7 @@ title: "Tabulation" author: "Davide Garolini" date: '`r Sys.Date()`' -output: - html_document: - theme: spacelab - toc: true - toc_float: - collapsed: false +output: html_document editor_options: chunk_output_type: console --- @@ -28,7 +23,7 @@ Being that this a working document that may be subjected to both deprecation and ## Introduction -Tabulation in `rtables` is a process that takes a pre-defined layout and applies it to data. The layout object, with all of its splits (see xxx link split machinery article) and `analyze`s, can be applied to different data to produce valid tables. This process happens principally within the `tt_dotabulation.R` file and the user-facing function `build_table` that resides in it. We will occasionally use functions and methods that are present in other files, like `colby_construction.R` or `make_subset_expr.R`. We assume the reader is already familiar with the documentation for `build_table`. We suggest reading the split machinery vignette (xxx link) prior to this one, as it is instrumental in understanding how the layout object, which is essentially built out of splits, is tabulated when data is supplied. +Tabulation in `rtables` is a process that takes a pre-defined layout and applies it to data. The layout object, with all of its splits and `analyze`s, can be applied to different data to produce valid tables. This process happens principally within the `tt_dotabulation.R` file and the user-facing function `build_table` that resides in it. We will occasionally use functions and methods that are present in other files, like `colby_construction.R` or `make_subset_expr.R`. We assume the reader is already familiar with the documentation for `build_table`. We suggest reading the [Split Machinery article](https://insightsengineering.github.io/rtables/main/articles/dev-guide/dg_split_machinery.html) prior to this one, as it is instrumental in understanding how the layout object, which is essentially built out of splits, is tabulated when data is supplied. ## Tabulation @@ -70,7 +65,7 @@ lyt@.Data # might not preserve the names # it works only when it is another clas # We suggest doing extensive testing about these behaviors in order to do choose the appropriate one ``` -Along with the various checks and defensive programming, we find `PreDataAxisLayout` which is a virtual class that both row and column layouts inherit from. Virtual classes are handy for group classes that need to share things like labels or functions that need to be applicable to their relative classes. See more information about the `rtables` class hierarchy in the dedicated article here (xxx add). +Along with the various checks and defensive programming, we find `PreDataAxisLayout` which is a virtual class that both row and column layouts inherit from. Virtual classes are handy for group classes that need to share things like labels or functions that need to be applicable to their relative classes. See more information about the `rtables` class hierarchy in the dedicated article [here](https://insightsengineering.github.io/rtables/main/articles/dev-guide/dg_table_hierarchy.html). Now, we continue with `build_table`. After the checks, we notice `TreePos()` which is a constructor for an object that retains a representation of the tree position along with split values and labels. This is mainly used by `create_colinfo`, which we enter now with `debugonce(create_colinfo)`. This function creates the object that represents the column splits and everything else that may be related to the columns. In particular, the column counts are calculated in this function. The parameter inputs are as follows: