The goal of causalXtreme is to provide an interface to perform causal discovery in linear structural equation models (SEM) with heavy-tailed noise. For more details see Gnecco et al. (2019, https://arxiv.org/abs/1908.05097).
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("nicolagnecco/causalXtreme")
Let us generate 500 observations from a SEM of two Student-t variables,
simulate_data
is called with the default values, it
returns a list containing:
- the simulated
dataset
represented as a matrix of size$n \times p$ . Here,$n = 500$ is the number of observations and$p = 2$ is the number of variables, - the underlying directed acyclic graph (DAG) represented as an
adjacency matrix
dag
of size$p\times p$ .
library(causalXtreme)
# basic example code
set.seed(1)
sem <- simulate_data(n = 500, p = 2, prob_connect = 0.5,
distr = "student_t", tail_index = 1.5)
At this point, we can look at the randomly simulated DAG.
sem$dag
#> [,1] [,2]
#> [1,] 0 1
#> [2,] 0 0
We interpret the adjacency matrix as follows. Loosely speaking, we say
that variable sem$dag
is equal to 1. We can plot the simulated dataset.
plot(sem$dataset, pch = 20,
xlab = "X1", ylab = "X2")
At this point, we can estimate the causal direction between
X1 <- sem$dataset[, 1]
X2 <- sem$dataset[, 2]
# gamma_12
causal_tail_coeff(X1, X2)
#> [1] 0.9523333
# gamma_21
causal_tail_coeff(X2, X1)
#> [1] 0.4816667
We see that the coefficient
We can also run the extremal ancestral search (EASE) algorithm, based on the causal tail coefficients (see Gnecco et al. 2019, sec. 3.1). The algorithm estimates from the data a causal order of the DAG.
ease(dat = sem$dataset)
#> [1] 1 2
In this case, we can see that the estimated causal order is correct,
since
Gnecco, Nicola, Nicolai Meinshausen, Jonas Peters, and Sebastian Engelke. 2019. “Causal Discovery in Heavy-Tailed Models.” arXiv Preprint arXiv:1908.05097.