Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

class hierarchy document #637

Merged
merged 14 commits into from
Dec 7, 2023
Merged
17 changes: 12 additions & 5 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,11 @@ navbar:
href: articles/dev-guide/dg_split_machinery.html
- text: Tabulation
href: articles/dev-guide/dg_tabulation.html
- text: Table Hierarchy
href: articles/dev-guide/dg_table_hierarchy.html
- text: Debugging in {rtables} and Beyond
href: articles/dev-guide/dg_debug_rtables.html
- text: Sparse notes on {rtables} internals
- text: Sparse Notes on {rtables} Internals
href: articles/dev-guide/dg_notes.html
reports:
text: Reports
Expand Down Expand Up @@ -87,11 +89,16 @@ articles:
contents:
- manual_table_construction
- tabulation_dplyr

- title: Developer Guide
desc: Articles intended for developer use only.
contents:
# *REF1* Dev Guide items
- 'dev-guide/dg_split_machinery'
- 'dev-guide/dg_tabulation'
- 'dev-guide/dg_debug_rtables'
- 'dev-guide/dg_notes'
- dev-guide/dg_split_machinery
- dev-guide/dg_tabulation
- dev-guide/dg_table_hierarchy
- dev-guide/dg_debug_rtables
- dev-guide/dg_notes

reference:
- title: Argument Conventions
Expand Down
4 changes: 2 additions & 2 deletions vignettes/advanced_usage.Rmd
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
title: "rtables Advanced Usage"
title: "{rtables} Advanced Usage"
author: "Gabriel Becker"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{rtables Advanced Usage}
%\VignetteIndexEntry{{rtables} Advanced Usage}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
Expand Down
6 changes: 2 additions & 4 deletions vignettes/dev-guide/dg_debug_rtables.Rmd
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
---
title: "Debugging in `rtables` and Beyond"
title: "Debugging in {rtables} and Beyond"
author: "Davide Garolini"
date: '`r Sys.Date()`'
output:
html_document:
theme: spacelab
output: html_document
editor_options:
chunk_output_type: console
---
Expand Down
2 changes: 1 addition & 1 deletion vignettes/dev-guide/dg_notes.Rmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Sparse notes on {rtables} internals"
title: "Sparse Notes on {rtables} Internals"
author: "Davide Garolini"
date: '`r Sys.Date()`'
output:
Expand Down
13 changes: 4 additions & 9 deletions vignettes/dev-guide/dg_split_machinery.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,7 @@
title: "Split Machinery"
author: "Davide Garolini"
date: '`r Sys.Date()`'
output:
html_document:
theme: spacelab
toc: true
toc_float:
collapsed: false
output: html_document
editor_options:
chunk_output_type: console
---
Expand All @@ -34,7 +29,7 @@ The following article will describe how the split machinery works in the row dom

## Process and Methods

Beforehand, we encourage the reader to familiarize themselves with the Debugging in `rtables`(xxx link here) article from the `rtables` Developers Guide. This document is generally valid for R programming, but has been tailored to study and understand complex packages that rely heavily on S3 and S4 object programming like `rtables`.
Beforehand, we encourage the reader to familiarize themselves with the [Debugging in {rtables} article](https://insightsengineering.github.io/rtables/main/articles/dev-guide/dg_debug_rtables.html) from the `rtables` Developers Guide. This document is generally valid for R programming, but has been tailored to study and understand complex packages that rely heavily on S3 and S4 object programming like `rtables`.

Here, we explore and study the split machinery with a growing amount of complexity, following relevant functions and methods throughout their execution. By going from basic to complex and by discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works.

Expand Down Expand Up @@ -133,7 +128,7 @@ We will see where and how input parameters are used. The most important paramete

We will start by looking at the first function called from `do_split`. This will give us a good overview of how the split itself is defined. This function is, of course, the check function (`check_validsplit`) that is used to verify if the split is valid for the data. In the following we will describe the split-class hierarchy step-by-step, but we invite the reader to explore this further on their own as well.

Let's first search the package for `check_validsplit`. You will find that it is defined as a generic in `R/split_funs.R`, where it is applied to the following "split" classes: `VarLevelSplit`, `MultiVarSplit`, `VAnalyzeSplit`, `CompoundSplit`, and `Split`. Another way to find this information, which is more useful for more spread out and complicated objects, is by using `showMethods(check_validsplit)`. The virtual class `VAnalyzeSplit` (by convention virtual classes start with "V") defines the main parent of the analysis split which we discuss in detail in the related vignette `vignette()` (xxx). From this, we can see that the `analyze()` calls actually mimic split objects as they create different results under a specific final split (or node). Now, notice that `check_validsplit` is also called in another location, the main `R/tt_dotabulation.R` source file. This is again something related to making "analyze" rows as it mainly checks for `VAnalyzeSplit` (link to tabulation dev guide xxx). We will discuss the other classes as they appear in our examples (link to class hierarchy xxx).
Let's first search the package for `check_validsplit`. You will find that it is defined as a generic in `R/split_funs.R`, where it is applied to the following "split" classes: `VarLevelSplit`, `MultiVarSplit`, `VAnalyzeSplit`, `CompoundSplit`, and `Split`. Another way to find this information, which is more useful for more spread out and complicated objects, is by using `showMethods(check_validsplit)`. The virtual class `VAnalyzeSplit` (by convention virtual classes start with "V") defines the main parent of the analysis split which we discuss in detail in the related vignette `vignette()` (xxx). From this, we can see that the `analyze()` calls actually mimic split objects as they create different results under a specific final split (or node). Now, notice that `check_validsplit` is also called in another location, the main `R/tt_dotabulation.R` source file. This is again something related to making "analyze" rows as it mainly checks for `VAnalyzeSplit`. See the [Tabulation article](https://insightsengineering.github.io/rtables/main/articles/dev-guide/dg_tabulation.html) for more details. We will discuss the other classes as they appear in our examples. See more about class hierarchy in the [Table Hierarchy article](https://insightsengineering.github.io/rtables/main/articles/dev-guide/dg_table_hierarchy.html).

For the moment, we see with `class(spl)` (from the main `do_split` function) that we are dealing with an `AllSplit` object. By calling `showMethods(check_validsplit)` we produce the following:

Expand Down Expand Up @@ -197,7 +192,7 @@ AllSplit <- function(split_label = "",
}
```

We can also print this information by calling `getClass("AllSplit")` for the general slot definition, or by calling `getClass(spl)`. Note that the first call will give also a lot of information about the class hierarchy. For more information regarding class hierarchy, please refer to the relevant article (xxx). We will discuss the majority of the slots by the end of this document. Now, let's see if we can find some of the values described in the constructor within our object. To do so, we will show the more compact representation given by `str`. When there are multiple and hierarchical slots that contain objects themselves, calling `str` will be much less or not at all informative if the maximum level of nesting is not set (e.g. `max.level = 2`).
We can also print this information by calling `getClass("AllSplit")` for the general slot definition, or by calling `getClass(spl)`. Note that the first call will give also a lot of information about the class hierarchy. For more information regarding class hierarchy, please refer to the relevant article [here](https://insightsengineering.github.io/rtables/main/articles/dev-guide/dg_talbe_hierarchy.html). We will discuss the majority of the slots by the end of this document. Now, let's see if we can find some of the values described in the constructor within our object. To do so, we will show the more compact representation given by `str`. When there are multiple and hierarchical slots that contain objects themselves, calling `str` will be much less or not at all informative if the maximum level of nesting is not set (e.g. `max.level = 2`).

```{r, eval=FALSE}
# rtables 0.6.2
Expand Down
95 changes: 95 additions & 0 deletions vignettes/dev-guide/dg_table_hierarchy.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
title: "Table Hierarchy"
author: "Abinaya Yogasekaram"
date: "`r Sys.Date()`"
output: html_document
editor_options:
chunk_output_type: console
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Disclaimer

This article is intended for use by developers only and will contain low-level explanations of the topics covered. For user-friendly vignettes, please see the [Articles](https://insightsengineering.github.io/rtables/main/articles/index.html) page on the `rtables` website.

Any code or prose which appears in the version of this article on the `main` branch of the repository may reflect a specific state of things that can be more or less recent. This guide describes very important aspects of table hierarchy that are unlikely to change. Regardless, we invite the reader to keep in mind that the current repository code may have drifted from the following material in this document, and it is always the best practice to read the code directly on `main`.

Please keep in mind that `rtables` is still under active development, and it has seen the efforts of multiple contributors across different years. Therefore, there may be legacy mechanisms and ongoing transformations that could look different in the future.

## Introduction

The scope of this vignette is to understand the structure of rtable objects, class hierarchy with an exploration of tree structures as S4 objects. Exploring table structure enables a better understanding of rtables concepts such as split machinery, tabulation, pagination and export. More details from the user's perspective of table structure can be found in the relevant vignettes.

isS4
getclass - for class structure


## Process and Methods

We invite developers to use the provided examples to interactively explore the rtables hierarchy. The most helpful command is 'getClass' for a list of the slots associated with a class, in addition to related classes and their relative distances.

## Representation of Information before generation


## Table Representation
"PredataAxisLayout" class is used to define the data subset instructions for tabulation. 2 subclasses (one for each axis): PredataColLayout, PredataRowLayout

## Slots, Parent-Child Relationships

## Content (summary row groups)

Splits are core functionality for rtables as tabulation and calculations are often required on subsets of the data.

## Split Machinery
```{r, message=FALSE}
library(rtables)
getClass("TreePos")
```

"TreePos" class contains split information as a list of the splits, split label values, and the subsets of the data that are generated by the split.

AllSplit
RootSplit
MultiVarSplit
VarStaticCutSplit
CumulativeCutSplit
VarDynCutSplit
CompoundSplit
VarLevWBaselineSplit


The highest level of the table hierarchy belong to "TableTree". The code below identifies the slots associated with with this class.
```{r}
getClass("TableTree")
```

As an S4 object, the slots can be accessed using "@" (similar to the use of "$" for list objects).
You'll notice there are classes that fall under "Extends". The classes contained here have a relationship to the TableTree object and are "virtual" classes. To avoid the repetition of slots and carrying the same data (set of slots for example) that multiple classes may need, rtables extensively uses virtual classes. A virtual class cannot be instantiated, the purpose is for other classes to inherit information from it.


```{r}

lyt <- basic_table(title = "big title") %>%
split_rows_by("SEX", page_by = TRUE) %>%
analyze("AGE")

tt <- build_table(lyt, DM)

# Though we don't recommend using str for studying rtable objects,
# we do find it useful in this instance to visualize the parent/child relationships.
str(tt, max.level=2)
```

## Tree Paths

Root to Leaves, are vectors of vectors
Tables are tree, nodes in the tree can have summaries associated with them. Tables are trees because of the nested structure. There is also the benefit of keeping and repeating necessary information when trying to paginate a table.

Children of ElementaryTables are row objects. TableTree can have children that are either row objects or other table objects.


#### TODO:
Create Tree Diagram showing class hierarchy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayogasekaram Is this the only missing thing? I think we can eventually reiterate updates in the future, what do you think? ^^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayogasekaram will you be adding this section in a later PR? If so, I think it's good to go for now.

11 changes: 3 additions & 8 deletions vignettes/dev-guide/dg_tabulation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,7 @@
title: "Tabulation"
author: "Davide Garolini"
date: '`r Sys.Date()`'
output:
html_document:
theme: spacelab
toc: true
toc_float:
collapsed: false
output: html_document
editor_options:
chunk_output_type: console
---
Expand All @@ -28,7 +23,7 @@ Being that this a working document that may be subjected to both deprecation and

## Introduction

Tabulation in `rtables` is a process that takes a pre-defined layout and applies it to data. The layout object, with all of its splits (see xxx link split machinery article) and `analyze`s, can be applied to different data to produce valid tables. This process happens principally within the `tt_dotabulation.R` file and the user-facing function `build_table` that resides in it. We will occasionally use functions and methods that are present in other files, like `colby_construction.R` or `make_subset_expr.R`. We assume the reader is already familiar with the documentation for `build_table`. We suggest reading the split machinery vignette (xxx link) prior to this one, as it is instrumental in understanding how the layout object, which is essentially built out of splits, is tabulated when data is supplied.
Tabulation in `rtables` is a process that takes a pre-defined layout and applies it to data. The layout object, with all of its splits and `analyze`s, can be applied to different data to produce valid tables. This process happens principally within the `tt_dotabulation.R` file and the user-facing function `build_table` that resides in it. We will occasionally use functions and methods that are present in other files, like `colby_construction.R` or `make_subset_expr.R`. We assume the reader is already familiar with the documentation for `build_table`. We suggest reading the [Split Machinery article](https://insightsengineering.github.io/rtables/main/articles/dev-guide/dg_split_machinery.html) prior to this one, as it is instrumental in understanding how the layout object, which is essentially built out of splits, is tabulated when data is supplied.

## Tabulation

Expand Down Expand Up @@ -70,7 +65,7 @@ [email protected] # might not preserve the names # it works only when it is another clas
# We suggest doing extensive testing about these behaviors in order to do choose the appropriate one
```

Along with the various checks and defensive programming, we find `PreDataAxisLayout` which is a virtual class that both row and column layouts inherit from. Virtual classes are handy for group classes that need to share things like labels or functions that need to be applicable to their relative classes. See more information about the `rtables` class hierarchy in the dedicated article here (xxx add).
Along with the various checks and defensive programming, we find `PreDataAxisLayout` which is a virtual class that both row and column layouts inherit from. Virtual classes are handy for group classes that need to share things like labels or functions that need to be applicable to their relative classes. See more information about the `rtables` class hierarchy in the dedicated article [here](https://insightsengineering.github.io/rtables/main/articles/dev-guide/dg_table_hierarchy.html).

Now, we continue with `build_table`. After the checks, we notice `TreePos()` which is a constructor for an object that retains a representation of the tree position along with split values and labels. This is mainly used by `create_colinfo`, which we enter now with `debugonce(create_colinfo)`. This function creates the object that represents the column splits and everything else that may be related to the columns. In particular, the column counts are calculated in this function. The parameter inputs are as follows:

Expand Down
Loading