-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.qmd
92 lines (61 loc) · 5.66 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# Introduction {.unnumbered}
```{r}
#| child: "_setup.qmd"
```
Omics datasets provide an overview of the content of cells for a specific molecular layer (e.g. transcriptome, proteome, metabolome). By integrating different omics datasets obtained on the same biological samples, we can gain a deeper understanding of the interactions between these molecular layers, and shed light on the regulations occurring both within and between layers.
A number of statistical methods have been developed to extract such information from multi-omics datasets, and many have been implemented as R packages. However, these tools differ conceptually, in terms of the input data they require, the assumptions they make, the statistical approaches they use or even the biological questions they seek to answer. They also differ at a practical level in terms of the format required for data input, the parameters to tune or select, and the format in which the results are returned. These differences render the application of several integration tools to the same multi-omics dataset and the comparison of their results complex and time-consuming.
The `moiraine` package aims at alleviating these issues, by providing a framework to **easily and consistently apply different integration tools to a same multi-omics dataset**, and to **compare the results from different integration tools**. It implements numerous visualisation and reporting functions to facilitate the interpretation of the integration results as well as the comparison of these results across integration methods. In addition, in an effort to make these computations reproducible, `moiraine` heavily relies on the [`targets` package](https://books.ropensci.org/targets/) for the construction of reproducible analysis pipelines.
## The `moiraine` package
The workflow for a typical multi-omics integration analysis handled with `moiraine` includes the following steps:
* Data import: this covers the import of omics measurements as well as associated metadata (i.e. information about the omics features and samples) -- `moiraine` relies on the [`MultiDataSet` package](https://bioconductor.org/packages/release/bioc/html/MultiDataSet.html) to store this information in a consistent format;
* Inspection of the omics datasets: including checking values density distribution, samples overlap between omics datasets, or presence of missing values;
* Preprocessing of the omics datasets: missing values imputation, transformation, and pre-filtering of samples and omics features;
* Integration of the omics datasets by one or more of the supported tools; currently, the following integration methods are covered in `moiraine`:
* sPLS and DIABLO from the `mixOmics` package
* sO2PLS from the `OmicsPLS` package
* MOFA and MEFISTO from the `MOFA2` package
* Interpretation of the integration results using standardised visualisations enriched with features and samples metadata;
* Comparison of the integration results obtained by different methods or pre-processing approaches.
Note that in the `moiraine` package, we refer to the different biological entities measured in a given dataset (e.g. genes, transcripts, metabolic compounds, etc) as **features**, and the observations in a dataset as **samples**. **Samples metadata** and **features metadata** denote information about the samples (such as treatment, collection date, etc) and features (e.g. name, biological function, etc), respectively.
## About this manual
In this manual, we are showcasing the functionalities of the `moiraine` package by walking through an in-depth example of a multi-omics integration analysis. We will be covering not only the functionalities of `moiraine`, but also discuss the different integration tools and provide recommendations for parameters setting and interpretation.
The `moiraine` package was designed to be used in the context of `targets` pipelines, therefore some familiarity with the `targets` package is necessary to follow this manual. Nevertheless, for users preferring R scripts to `targets` pipelines, alternative code will be provided to translate the [target factories](https://wlandau.github.io/targetopia/contributing.html#target-factories) implemented in `moiraine`. To learn about `targets`, we refer readers to the excellent [`targets` manual](https://books.ropensci.org/targets/).
For clarity, throughout this manual, any code that belongs in a `targets` script (e.g. in the targets list in `_targets.R`) will be shown in a blue chunk, e.g.:
::: {.targets-chunk}
```{targets my-first-target}
tar_target(
my_first_target,
c(2 + 2)
)
```
:::
Any code shown in a regular code chunk (see below) is code that should be run in the command line, or saved in a regular R script. It often showcases how to inspect a certain target output after having run `tar_make()` to execute the `targets` pipeline, e.g.:
```{r loading-packages}
library(targets)
tar_read(my_first_target)
```
Note that for readers that prefer to use regular R scripts to `targets` pipelines, the target command in the example target chunk above can be converted to R script-compatible code as follows:
```{r target-translation}
my_first_target <- 2 + 2
```
and the calls to `tar_read()` and `tar_load()` ignored.
Lastly, throughout the manual, the following options are set to improve upon the default colour scales:
```{r ggplo2-colour-options}
#| eval: false
options(
ggplot2.continuous.colour = "viridis",
ggplot2.continuous.fill = "viridis",
ggplot2.discrete.colour = function() {
ggplot2::scale_colour_brewer(
palette = "Paired",
na.value = "grey"
)
} ,
ggplot2.discrete.fill = function() {
ggplot2::scale_fill_brewer(
palette = "Paired",
na.value = "grey"
)
}
)
```