-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy pathREADME.Rmd
114 lines (86 loc) · 4.26 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
output: github_document
always_allow_html: true
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```
[![GitHub Actions Build Status](https://github.com/DataSlingers/clustRviz/workflows/R-CMD-check and Deploy/badge.svg)](https://github.com/DataSlingers/clustRviz/actions?query=workflow%3A%22R-CMD-check+and+Deploy%22)
[![codecov Coverage Status](https://codecov.io/gh/DataSlingers/clustRviz/branch/develop/graph/badge.svg)](https://codecov.io/gh/DataSlingers/clustRviz/branch/develop)
[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/clustRviz)](https://cran.r-project.org/package=clustRviz)
[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
# clustRviz
`clustRviz` aims to enable fast computation and easy visualization of Convex Clustering
solution paths.
## Installation
You can install `clustRviz` from github with:
```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("DataSlingers/clustRviz")
```
Note that `RcppEigen` (which `clustRviz` internally) triggers many compiler warnings
(which cannot be suppressed per
[CRAN policies](http://cran.r-project.org/web/packages/policies.html#Source-packages)).
Many of these warnings can be locally suppressed by adding the line `CXX11FLAGS+=-Wno-ignored-attributes`
to your `~/.R/Makevars` file. To install an `R` package from source, you will need
suitable development tools installed including a `C++` compiler and potentially
a Fortran runtime. Details about these toolchains are available on CRAN for
[Windows](https://cran.r-project.org/bin/windows/Rtools/) and [macOS](https://mac.r-project.org/tools/).
## Quick-Start
There are two main entry points to the `clustRviz` package, the `CARP` and `CBASS`
functions, which perform convex clustering and convex biclustering respectively.
We demonstrate the use of these two functions on a text minining data set,
`presidential_speech`, which measures how often the 44 U.S. presidents used certain
words in their public addresses.
```{r load_data}
library(clustRviz)
data(presidential_speech)
presidential_speech[1:6, 1:6]
```
### Clustering
We begin by clustering this data set, grouping the rows (presidents) into clusters:
```{r carp_example}
carp_fit <- CARP(presidential_speech)
print(carp_fit)
```
The algorithmic regularization technique employed by `CARP` makes computation of
the whole solution path almost immediate.
We can examine the result of `CARP` graphically. We begin with a standard dendrogram,
with three clusters highlighted:
```{r carp_dendro}
plot(carp_fit, type = "dendrogram", k = 3)
```
Examing the dendrogram, we see two clear clusters, consisting of pre-WWII and post-WWII
presidents and Warren G. Harding as a possible outlier. Harding is generally considered
one of the worst US presidents of all time, so this is perhaps not too surprising.
A more interesting visualization is the dynamic path visualization, whereby we can
watch the clusters fuse as the regularization level is increased:
```{r carp_dynamic, eval = FALSE}
plot(carp_fit, type = "path", dynamic = TRUE)
```
### BiClustering
The use of `CBASS` for convex biclustering is similar, and we demonstrate it here
with a cluster heatmap, with the regularization set to give 3 observation clusters:
```{r cbass}
cbass_fit <- CBASS(presidential_speech)
plot(cbass_fit, k.row = 3)
```
By default, plotting the result of CBASS gives the traditional cluster heatmap,
but we can also get the row or column dendrograms as well:
```{r cbass_rowdendro}
plot(cbass_fit, type = "row.dendrogram", k.row = 3)
```
By default, if a regularization level is specified, all plotting functions in `clustRviz`
will plot the clustered data. If the regularization level is not specified, the
raw data will be plotted instead:
```{r cbass_heatmap2}
plot(cbass_fit, type = "heatmap")
```
More details about the use and mathematical formulation of `CARP` and `CBASS`
may be found in the package documentation.