update formatting.md

salbalkus · Oct 17, 2024 · c05c808 · c05c808
1 parent df477ba
commit c05c808
Show file tree

Hide file tree

Showing 2 changed files with 8 additions and 4 deletions.
diff --git a/docs/src/man/formatting.md b/docs/src/man/formatting.md
@@ -45,9 +45,11 @@ nothing # hide
 
 ## Tables with Network-Dependent Units
 
-The previous example assumes that each unit (row in the Table, in this case `df`), is "causally independent" of every other unit -- that is, the treatment of one unit does not affect the response of any other unit. This is a component of the "stable unit treatment value assumption" (SUTVA) often used in causal inference. In some cases, however, we might work with data in which units may *not* be causally independent, but rather, in which one unit's variables could dependent on some summary function of its neighbors (in which case, SUTVA is violated). 
+The previous example assumes that each unit (row in the Table, in this case `tbl`), is "causally independent" of every other unit -- that is, the treatment of one unit does not affect the response of any other unit. This is a component of the "stable unit treatment value assumption" (SUTVA) often used in causal inference. In some cases, however, we might work with data in which units may *not* be causally independent, but rather, in which one unit's variables depend on some summary function of its neighbors
 
-Each `CausalTable` has an "arrays" argument, a `NamedTuple` that can store adjacency matrices and other miscellaneous parameters that denote the causal relationships between variables. The code below provides an example of how such a `CausalTable` might be constructed using the Karate Club dataset. In this example, treatment is defined as the number of friends a club member has, denoted by the summary function parameter `summaries = (friends = Friends(:F),)`. Hence, this answers the causal question "how would changing a subject's number of friends (`friends`) affect which club they are likely to join (`labels_clubs`)?" 
+In this case, one must instead perform causal inference on the summary functions of each unit's neighbors ([Aronow and Samii, 2017](https://doi.org/10.1214/16-AOAS1005)). To do this, each `CausalTable` has two relevant arguments that can be used to correct SUTVA violations. The `arrays` argument is a `NamedTuple` that can store adjacency matrices and other miscellaneous parameters that denote the causal relationships between variables. The `summaries` argument is a tuple of `NetworkSummary` objects that can be used to summarize the network relationships between units by referencing variables in either the underlying data or the `arrays` argument of `CausalTable` (or both). 
+
+The code below provides an example of how such a `CausalTable` might be constructed to consider a summary function treatment in the case of causally-dependent units, using the Karate Club dataset. In this example, treatment is defined as the number of friends a club member has, denoted by the summary function parameter `summaries = (friends = Friends(:F),)`. Hence, this answers the causal question "how would changing a subject's number of friends (`friends`) affect which club they are likely to join (`labels_clubs`)?" 
 
 We store the network relationships between units as an adjacency matrix `F` by assigning it to the `arrays` parameters. This allows the `Friends(:F)` summary function to access it when calling `summarize(ctbl)`. More detail on the types of `NetworkSummary` that can be used in a dependent-data `CausalTable` can be found in [Network Summaries](network-summaries.md)
 
@@ -74,7 +76,9 @@ ctbl = CausalTable(tbl; treatment = :friends, response = :labels_clubs, arrays =
 nothing # hide
 ```
 
-Based on these summaries, it is possible to extract two matrices from the `CausalTable` object: the `adjacency_matrix` and the `dependency_matrix`. The `adjacency_matrix` denotes which units are *causally dependent* upon one another: an entry of 1 in cell (i,j) indicates that some variable in unit i exhibits a causal relationship to some variable in unit j. The `dependency_matrix` stores which units are *statistically dependent* upon one another: an entry of 1 in cell (i,j) indicates that the data of unit i is correlated with the data in unit j. Two units are correlated if they either are causally dependent (neighbors in the adjacency matrix) or share a common neighbor in the adjacency matrix.
+One can then call the function `summarize(ctbl)` to compute the values of the summary function on the causal table. 
+
+Based on these summaries, it is also possible to extract two matrices from the `CausalTable` object: the `adjacency_matrix` and the `dependency_matrix`. The `adjacency_matrix` denotes which units are *causally dependent* upon one another: an entry of 1 in cell (i,j) indicates that some variable in unit i exhibits a causal relationship to some variable in unit j. The `dependency_matrix` stores which units are *statistically dependent* upon one another: an entry of 1 in cell (i,j) indicates that the data of unit i is correlated with the data in unit j. Two units are correlated if they either are causally dependent (neighbors in the adjacency matrix) or share a common neighbor in the adjacency matrix.
 
 ```@example karateclub
 CausalTables.adjacency_matrix(ctbl) # get adjacency matrix

diff --git a/paper.md b/paper.md
@@ -26,7 +26,7 @@ CausalTables.jl provides tools to evaluate and compare the statistical performan
 
 # Statement of need
 
-The field of causal inference helps scientists and decision-makers understand cause-and-effect relationships between variables in data [@hernan2020causal]. As interest in this field has grown across disciplines, so too has the development of software tools for estimating causal effects. In Julia, packages for causal inference have begun to emerge [@TMLE.jl; @CausalELM.jl], though they are generally still in their infancy. Because new methods for causal inference in various settings are being developed at a rapid pace, it is important to have tools that make it easy to evaluate and compare their performance. The goal of CausalTables.jl is to provide such a tool in Julia. 
+The field of causal inference helps scientists and decision-makers understand cause-and-effect relationships between variables in data [@hernan2020causal]. As interest in this field has grown across disciplines, so too has the development of software tools for estimating causal effects. In Julia, packages for causal inference have begun to emerge, such as TMLE.jl [@TMLE.jl] and CausalELM.jl [@CausalELM.jl], though such packages are generally still in their infancy. Because new methods for causal inference in various settings are being developed at a rapid pace, it is important to have tools that make it easy to evaluate and compare their performance. The goal of CausalTables.jl is to provide such a tool in Julia. 
 
 Currently, those attempting to benchmark causal inference methods in Julia face two major challenges. First, packages often have inconsistent interfaces. The canonical problem in causal inference typically takes the same form across applications: estimate the effect of some treatment variable $A$ on a response variable $Y$ in the presence of confounders $W$. Howevever, software packages to do this often require data and their ``causal labels'' to be provided as input in different ways. For example, some methods might require the user to provide vectors for treatment and response, while others might require the entire dataset in a Tables.jl format with treatment and response labels as strings or symbols. By providing a common interface for storing causal structure information in a Tables-compatible format, CausalTables.jl makes it easy to package data and auxiliary causal information and extract the necessary components needed for benchmarking.