Metagenomic Feature Select

R/Bioconductor script for hierachical clustering by various DNA sequence features and measuring its similarity with reference clustering.

How to Run

Install the latest stable version of R (3.3.0) from CRAN.

Run the latest R and install packages Rcpp, inline, fastcluster and ggtree using Bioconductor installer.

source("https://bioconductor.org/biocLite.R")
biocLite(c("Rcpp", "inline", "fastcluster", "ggtree"))

Download the script feature_select.R from GitHub into current working folder and load it into R.
```
source("feature_select.R")
```

Use functions load_features, eval_all_feature_comb, find_best_features and plot_tree to run the analysis and visualize the results. The most time consuming function eval_all_feature_comb can be run in parallel mode (set parallel option to TRUE). See the following example:

features <- load_features("features.csv")
res <- eval_all_feature_comb(features, min_features = 2, max_features = 6, parallel = TRUE)
best_euc <- find_best_features(res, 'euc') # for euclidean distance
best_cos <- find_best_features(res, 'cos') # for cosine dissimilarity
plot_tree(best_euc$tree, features$Organism, "tree_euc.pdf")
plot_tree(best_cos$tree, features$Organism, "tree_cos.pdf")

Feel free to inspect the structure of best_euc and best_cos variables. It is just a list of three items: features contains a vector of used features, k is a cutoff value (number of clusters) of the hierarchical clustering tree, that gave the best Variantion of Information (VI) with reference clustering and vi contains the best VI value.

You can also plot a 2D projection of the clustering for comparison with reference:

ref_cls <- features$Organism
plot_clustering_2d(features$HP1_aktivita, features$HP2_mobilita, ref_cls, "2d_ref.pdf")
best_cls <- cutree(best_euc$tree, best_euc$k)
plot_clustering_2d(features$HP1_aktivita, features$HP2_mobilita, best_cls, "2d_euc.pdf")

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
fig		fig
.gitignore		.gitignore
README.md		README.md
feature_select.R		feature_select.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metagenomic Feature Select

How to Run

About

Releases

Packages

Languages

jirihon/mg-feature-select

Folders and files

Latest commit

History

Repository files navigation

Metagenomic Feature Select

How to Run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages