-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathAnnotationHub.html
1348 lines (1252 loc) · 73.9 KB
/
AnnotationHub.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta http-equiv="X-UA-Compatible" content="IE=EDGE" />
<title>AnnotationHub: Access the AnnotationHub Web Service</title>
<script src="site_libs/header-attrs-2.14/header-attrs.js"></script>
<script src="site_libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="site_libs/bootstrap-3.3.5/css/bootstrap.min.css" rel="stylesheet" />
<script src="site_libs/bootstrap-3.3.5/js/bootstrap.min.js"></script>
<script src="site_libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script>
<script src="site_libs/bootstrap-3.3.5/shim/respond.min.js"></script>
<style>h1 {font-size: 34px;}
h1.title {font-size: 38px;}
h2 {font-size: 30px;}
h3 {font-size: 24px;}
h4 {font-size: 18px;}
h5 {font-size: 16px;}
h6 {font-size: 12px;}
code {color: inherit; background-color: rgba(0, 0, 0, 0.04);}
pre:not([class]) { background-color: white }</style>
<script src="site_libs/jqueryui-1.11.4/jquery-ui.min.js"></script>
<link href="site_libs/tocify-1.9.1/jquery.tocify.css" rel="stylesheet" />
<script src="site_libs/tocify-1.9.1/jquery.tocify.js"></script>
<script src="site_libs/navigation-1.1/tabsets.js"></script>
<link href="site_libs/highlightjs-9.12.0/textmate.css" rel="stylesheet" />
<script src="site_libs/highlightjs-9.12.0/highlight.js"></script>
<link href="site_libs/font-awesome-5.1.0/css/all.css" rel="stylesheet" />
<link href="site_libs/font-awesome-5.1.0/css/v4-shims.css" rel="stylesheet" />
<link href="site_libs/ionicons-2.0.1/css/ionicons.min.css" rel="stylesheet" />
<style type="text/css">
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
</style>
<style type="text/css">code{white-space: pre;}</style>
<script type="text/javascript">
if (window.hljs) {
hljs.configure({languages: []});
hljs.initHighlightingOnLoad();
if (document.readyState && document.readyState === "complete") {
window.setTimeout(function() { hljs.initHighlighting(); }, 0);
}
}
</script>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css" type="text/css" />
<style type = "text/css">
.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
img {
max-width:100%;
}
.tabbed-pane {
padding-top: 12px;
}
.html-widget {
margin-bottom: 20px;
}
button.code-folding-btn:focus {
outline: none;
}
summary {
display: list-item;
}
details > summary > p:only-child {
display: inline;
}
pre code {
padding: 0;
}
</style>
<style type="text/css">
.dropdown-submenu {
position: relative;
}
.dropdown-submenu>.dropdown-menu {
top: 0;
left: 100%;
margin-top: -6px;
margin-left: -1px;
border-radius: 0 6px 6px 6px;
}
.dropdown-submenu:hover>.dropdown-menu {
display: block;
}
.dropdown-submenu>a:after {
display: block;
content: " ";
float: right;
width: 0;
height: 0;
border-color: transparent;
border-style: solid;
border-width: 5px 0 5px 5px;
border-left-color: #cccccc;
margin-top: 5px;
margin-right: -10px;
}
.dropdown-submenu:hover>a:after {
border-left-color: #adb5bd;
}
.dropdown-submenu.pull-left {
float: none;
}
.dropdown-submenu.pull-left>.dropdown-menu {
left: -100%;
margin-left: 10px;
border-radius: 6px 0 6px 6px;
}
</style>
<script type="text/javascript">
// manage active state of menu based on current page
$(document).ready(function () {
// active menu anchor
href = window.location.pathname
href = href.substr(href.lastIndexOf('/') + 1)
if (href === "")
href = "index.html";
var menuAnchor = $('a[href="' + href + '"]');
// mark it active
menuAnchor.tab('show');
// if it's got a parent navbar menu mark it active as well
menuAnchor.closest('li.dropdown').addClass('active');
// Navbar adjustments
var navHeight = $(".navbar").first().height() + 15;
var style = document.createElement('style');
var pt = "padding-top: " + navHeight + "px; ";
var mt = "margin-top: -" + navHeight + "px; ";
var css = "";
// offset scroll position for anchor links (for fixed navbar)
for (var i = 1; i <= 6; i++) {
css += ".section h" + i + "{ " + pt + mt + "}\n";
}
style.innerHTML = "body {" + pt + "padding-bottom: 40px; }\n" + css;
document.head.appendChild(style);
});
</script>
<!-- tabsets -->
<style type="text/css">
.tabset-dropdown > .nav-tabs {
display: inline-table;
max-height: 500px;
min-height: 44px;
overflow-y: auto;
border: 1px solid #ddd;
border-radius: 4px;
}
.tabset-dropdown > .nav-tabs > li.active:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li.active:before {
content: "";
border: none;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}
.tabset-dropdown > .nav-tabs > li.active {
display: block;
}
.tabset-dropdown > .nav-tabs > li > a,
.tabset-dropdown > .nav-tabs > li > a:focus,
.tabset-dropdown > .nav-tabs > li > a:hover {
border: none;
display: inline-block;
border-radius: 4px;
background-color: transparent;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li {
display: block;
float: none;
}
.tabset-dropdown > .nav-tabs > li {
display: none;
}
</style>
<!-- code folding -->
<style type="text/css">
#TOC {
margin: 25px 0px 20px 0px;
}
@media (max-width: 768px) {
#TOC {
position: relative;
width: 100%;
}
}
@media print {
.toc-content {
/* see https://github.com/w3c/csswg-drafts/issues/4434 */
float: right;
}
}
.toc-content {
padding-left: 30px;
padding-right: 40px;
}
div.main-container {
max-width: 1200px;
}
div.tocify {
width: 20%;
max-width: 260px;
max-height: 85%;
}
@media (min-width: 768px) and (max-width: 991px) {
div.tocify {
width: 25%;
}
}
@media (max-width: 767px) {
div.tocify {
width: 100%;
max-width: none;
}
}
.tocify ul, .tocify li {
line-height: 20px;
}
.tocify-subheader .tocify-item {
font-size: 0.90em;
}
.tocify .list-group-item {
border-radius: 0px;
}
.tocify-subheader {
display: inline;
}
.tocify-subheader .tocify-item {
font-size: 0.95em;
}
</style>
</head>
<body>
<div class="container-fluid main-container">
<!-- setup 3col/9col grid for toc_float and main content -->
<div class="row">
<div class="col-xs-12 col-sm-4 col-md-3">
<div id="TOC" class="tocify">
</div>
</div>
<div class="toc-content col-xs-12 col-sm-8 col-md-9">
<div class="navbar navbar-inverse navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-bs-toggle="collapse" data-target="#navbar" data-bs-target="#navbar">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="index.html">seandavi(s12): Courses and Tutorials</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li>
<a href="index.html">Home</a>
</li>
<li>
<a href="about.html">About</a>
</li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li>
<a href="setup.html">
<span class="fa fa-cogs"></span>
setup
</a>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" data-bs-toggle="dropdown" aria-expanded="false">
<span class="ion ion-easel"></span>
Slides
<span class="caret"></span>
</a>
<ul class="dropdown-menu" role="menu">
<li>
<a href="motivation_for_R_slides.html">Motivation for using R</a>
</li>
<li>
<a href="http://bit.ly/bioc_cshl_2019">Introduction to Bioconductor</a>
</li>
<li>
<a href="https://drive.google.com/file/d/1txUz-a84VVxiB1ouv24ujL2DSTfxgblL/view?usp=sharing">Advanced Bioconductor Overview</a>
</li>
<li>
<a href="MachineLearning.html">Machine Learning hands-on</a>
</li>
<li>
<a href="https://docs.google.com/presentation/d/1PKP39ze3kATKCXxx-AUuDdI4FUpA85UQJxDMhXIK3Mk/edit?usp=sharing">Machine Learning Intro</a>
</li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" data-bs-toggle="dropdown" aria-expanded="false">
<span class="fa fa-question fa-lg"></span>
Misc.
<span class="caret"></span>
</a>
<ul class="dropdown-menu" role="menu">
<li>
<a href="further_resources.html">Further resources</a>
</li>
<li>
<a href="https://github.com/seandavi/ITR">Source code for this site</a>
</li>
<li>
<a href="https://github.com/seandavi/ITR/archive/master.zip">Download materials</a>
</li>
</ul>
</li>
</ul>
</div><!--/.nav-collapse -->
</div><!--/.container -->
</div><!--/.navbar -->
<div id="header">
<h1 class="title toc-ignore">AnnotationHub: Access the AnnotationHub Web Service</h1>
</div>
<script type="text/javascript">
document.addEventListener("DOMContentLoaded", function() {
document.querySelector("h1").className = "title";
});
</script>
<script type="text/javascript">
document.addEventListener("DOMContentLoaded", function() {
var links = document.links;
for (var i = 0, linksLength = links.length; i < linksLength; i++)
if (links[i].hostname != window.location.hostname)
links[i].target = '_blank';
});
</script>
<div id="introduction" class="section level1">
<h1>Introduction</h1>
<p>Finding and using public genomics data such as browser or chip-seq tracks; annotation for genes, exons, transcripts; gene ontology and functional gene information; etc. often requires quite a bit of work. Bioconductor has done some of this work already by</p>
<ol style="list-style-type: decimal">
<li>Finding and curating popular genomic resources</li>
<li>Using “recipes” to create R object versions of these resources</li>
<li>Make those resources available as a web service that is accessible from R</li>
</ol>
<p>The <code>AnnotationHub</code> server provides easy <em>R / Bioconductor</em> access to large collections of publicly available whole genome resources, e.g,. ENSEMBL genome fasta or gtf files, UCSC chain resources, ENCODE data tracks at UCSC, etc.</p>
<p>To get started, make sure that you have the <code>AnnotationHub</code> package installed:</p>
<pre class="r"><code>BiocManager::install('AnnotationHub')</code></pre>
</div>
<div id="annotationhub-objects" class="section level1">
<h1>AnnotationHub objects</h1>
<p>The <em><a href="https://bioconductor.org/packages/3.15/AnnotationHub">AnnotationHub</a></em> package provides a client interface to resources stored at the AnnotationHub web service.</p>
<pre class="r"><code>library(AnnotationHub)</code></pre>
<p>The <em><a href="https://bioconductor.org/packages/3.15/AnnotationHub">AnnotationHub</a></em> package is straightforward to use. Create an <code>AnnotationHub</code> object</p>
<pre class="r"><code>ah = AnnotationHub()</code></pre>
<pre><code>## snapshotDate(): 2022-04-21</code></pre>
<p>Now at this point you have already done everything you need in order to start retrieving annotations. For most operations, using the <code>AnnotationHub</code> object should feel a lot like working with a familiar <code>list</code> or <code>data.frame</code>.</p>
<p>Lets take a minute to look at the show method for the hub object ah</p>
<pre class="r"><code>ah</code></pre>
<pre><code>## AnnotationHub with 64948 records
## # snapshotDate(): 2022-04-21
## # $dataprovider: Ensembl, BroadInstitute, UCSC, ftp://ftp.ncbi.nlm.nih.gov/g...
## # $species: Homo sapiens, Mus musculus, Drosophila melanogaster, Bos taurus,...
## # $rdataclass: GRanges, TwoBitFile, BigWigFile, EnsDb, Rle, OrgDb, ChainFile...
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["AH5012"]]'
##
## title
## AH5012 | Chromosome Band
## AH5013 | STS Markers
## AH5014 | FISH Clones
## AH5015 | Recomb Rate
## AH5016 | ENCODE Pilot
## ... ...
## AH104604 | Zonotrichia_albicollis.Zonotrichia_albicollis-1.0.1.ncrna.2bit
## AH104605 | Zosterops_lateralis_melanops.ASM128173v1.cdna.all.2bit
## AH104606 | Zosterops_lateralis_melanops.ASM128173v1.dna_rm.toplevel.2bit
## AH104607 | Zosterops_lateralis_melanops.ASM128173v1.dna_sm.toplevel.2bit
## AH104608 | Zosterops_lateralis_melanops.ASM128173v1.ncrna.2bit</code></pre>
<p>You can see that it gives you an idea about the different types of data that are present inside the hub. You can see where the data is coming from (dataprovider), as well as what species have samples present (species), what kinds of R data objects could be returned (rdataclass). We can take a closer look at all the kinds of data providers that are available by simply looking at the contents of dataprovider as if it were the column of a data.frame object like this:</p>
<pre class="r"><code>unique(ah$dataprovider)</code></pre>
<pre><code>## [1] "UCSC"
## [2] "Ensembl"
## [3] "RefNet"
## [4] "Inparanoid8"
## [5] "NHLBI"
## [6] "ChEA"
## [7] "Pazar"
## [8] "NIH Pathway Interaction Database"
## [9] "Haemcode"
## [10] "BroadInstitute"
## [11] "PRIDE"
## [12] "Gencode"
## [13] "CRIBI"
## [14] "Genoscope"
## [15] "MISO, VAST-TOOLS, UCSC"
## [16] "UWashington"
## [17] "Stanford"
## [18] "dbSNP"
## [19] "BioMart"
## [20] "GeneOntology"
## [21] "KEGG"
## [22] "URGI"
## [23] "EMBL-EBI"
## [24] "MicrosporidiaDB"
## [25] "FungiDB"
## [26] "TriTrypDB"
## [27] "ToxoDB"
## [28] "AmoebaDB"
## [29] "PlasmoDB"
## [30] "PiroplasmaDB"
## [31] "CryptoDB"
## [32] "TrichDB"
## [33] "GiardiaDB"
## [34] "The Gene Ontology Consortium"
## [35] "ENCODE Project"
## [36] "SchistoDB"
## [37] "NCBI/UniProt"
## [38] "GENCODE"
## [39] "http://www.pantherdb.org"
## [40] "RMBase v2.0"
## [41] "snoRNAdb"
## [42] "tRNAdb"
## [43] "NCBI"
## [44] "DrugAge, DrugBank, Broad Institute"
## [45] "DrugAge"
## [46] "DrugBank"
## [47] "Broad Institute"
## [48] "HMDB, EMBL-EBI, EPA"
## [49] "STRING"
## [50] "OMA"
## [51] "OrthoDB"
## [52] "PathBank"
## [53] "EBI/EMBL"
## [54] "NCBI,DBCLS"
## [55] "FANTOM5,DLRP,IUPHAR,HPRD,STRING,SWISSPROT,TREMBL,ENSEMBL,CELLPHONEDB,BADERLAB,SINGLECELLSIGNALR,HOMOLOGENE"
## [56] "WikiPathways"
## [57] "UCSC Jaspar"
## [58] "VAST-TOOLS"
## [59] "pyGenomeTracks "
## [60] "NA"
## [61] "UoE"
## [62] "mitra.stanford.edu/kundaje/akundaje/release/blacklists/"
## [63] "ENCODE"
## [64] "TargetScan,miRTarBase,USCS,ENSEMBL"
## [65] "TargetScan"
## [66] "QuickGO"
## [67] "ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/"</code></pre>
<p>In the same way, you can also see data from different species inside the hub by looking at the contents of species like this:</p>
<pre class="r"><code>head(unique(ah$species))</code></pre>
<pre><code>## [1] "Homo sapiens" "Vicugna pacos" "Dasypus novemcinctus"
## [4] "Otolemur garnettii" "Papio hamadryas" "Papio anubis"</code></pre>
<p>And this will also work for any of the other types of metadata present. You can learn which kinds of metadata are available by simply hitting the tab key after you type ‘ah$’. In this way you can explore for yourself what kinds of data are present in the hub right from the command line. This interface also allows you to access the hub programatically to extract data that matches a particular set of criteria.</p>
<p>Another valuable types of metadata to pay attention to is the rdataclass.</p>
<pre class="r"><code>head(unique(ah$rdataclass))</code></pre>
<pre><code>## [1] "GRanges" "data.frame" "Inparanoid8Db" "TwoBitFile"
## [5] "ChainFile" "SQLiteConnection"</code></pre>
<p>The rdataclass allows you to see which kinds of R objects the hub will return to you. This kind of information is valuable both as a means to filter results and also as a means to explore and learn about some of the kinds of annotation objects that are widely available for the project. Right now this is a pretty short list, but over time it should grow as we support more of the different kinds of annotation objects via the hub.</p>
<p>Now lets try getting the Chain Files from UCSC using the query and subset methods to selectively pare down the hub based on specific criteria. The query method lets you search rows for specific strings, returning an <code>AnnotationHub</code> instance with just the rows matching the query.</p>
<p>From the show method, one can easily see that one of the dataprovider is UCSC and there is a rdataclass for ChainFile</p>
<p>One can get chain files for Drosophila melanogaster from UCSC with:</p>
<pre class="r"><code>dm <- query(ah, c("ChainFile", "UCSC", "Drosophila melanogaster"))
dm</code></pre>
<pre><code>## AnnotationHub with 45 records
## # snapshotDate(): 2022-04-21
## # $dataprovider: UCSC
## # $species: Drosophila melanogaster
## # $rdataclass: ChainFile
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["AH15102"]]'
##
## title
## AH15102 | dm3ToAnoGam1.over.chain.gz
## AH15103 | dm3ToApiMel3.over.chain.gz
## AH15104 | dm3ToDm2.over.chain.gz
## AH15105 | dm3ToDm6.over.chain.gz
## AH15106 | dm3ToDp3.over.chain.gz
## ... ...
## AH15142 | dm2ToDroVir3.over.chain.gz
## AH15143 | dm2ToDroWil1.over.chain.gz
## AH15144 | dm2ToDroYak1.over.chain.gz
## AH15145 | dm2ToDroYak2.over.chain.gz
## AH15146 | dm1ToDm2.over.chain.gz</code></pre>
<p>Query has worked and you can now see that the only species present is Drosophila melanogaster.</p>
<p>The metadata underlying this hub object can be retrieved by you</p>
<pre class="r"><code>df <- mcols(dm)
# what is df?
class(df)</code></pre>
<pre><code>## [1] "DFrame"
## attr(,"package")
## [1] "S4Vectors"</code></pre>
<pre class="r"><code>head(df[,1:5])</code></pre>
<pre><code>## DataFrame with 6 rows and 5 columns
## title dataprovider species taxonomyid
## <character> <character> <character> <integer>
## AH15102 dm3ToAnoGam1.over.ch.. UCSC Drosophila melanogas.. 7227
## AH15103 dm3ToApiMel3.over.ch.. UCSC Drosophila melanogas.. 7227
## AH15104 dm3ToDm2.over.chain.gz UCSC Drosophila melanogas.. 7227
## AH15105 dm3ToDm6.over.chain.gz UCSC Drosophila melanogas.. 7227
## AH15106 dm3ToDp3.over.chain.gz UCSC Drosophila melanogas.. 7227
## AH15107 dm3ToDp4.over.chain.gz UCSC Drosophila melanogas.. 7227
## genome
## <character>
## AH15102 dm3
## AH15103 dm3
## AH15104 dm3
## AH15105 dm3
## AH15106 dm3
## AH15107 dm3</code></pre>
<p>By default the show method will only display the first 5 and last 5 rows. There are already thousands of records present in the hub.</p>
<pre class="r"><code>length(ah)</code></pre>
<pre><code>## [1] 64948</code></pre>
<p>Lets look at another example, where we pull down only Inparanoid8 data from the hub and use subset to return a smaller base object (here we are finding cases where the genome column is set to panda).</p>
<pre class="r"><code>ahs <- query(ah, c('inparanoid8', 'ailuropoda'))
ahs</code></pre>
<pre><code>## AnnotationHub with 1 record
## # snapshotDate(): 2022-04-21
## # names(): AH10451
## # $dataprovider: Inparanoid8
## # $species: Ailuropoda melanoleuca
## # $rdataclass: Inparanoid8Db
## # $rdatadateadded: 2014-03-31
## # $title: hom.Ailuropoda_melanoleuca.inp8.sqlite
## # $description: Inparanoid 8 annotations about Ailuropoda melanoleuca
## # $taxonomyid: 9646
## # $genome: inparanoid8 genomes
## # $sourcetype: Inparanoid
## # $sourceurl: http://inparanoid.sbc.su.se/download/current/Orthologs/A.melan...
## # $sourcesize: NA
## # $tags: c("Inparanoid", "Gene", "Homology", "Annotation")
## # retrieve record with 'object[["AH10451"]]'</code></pre>
<p>We can also look at the <code>AnnotationHub</code> object in a browser using the <code>display()</code> function. We can then filter the <code>AnnotationHub</code> object for _chainFile__ by either using the Global search field on the top right corner of the page or the in-column search field for `rdataclass’.</p>
<pre class="r"><code>d <- display(ah)</code></pre>
<p><img src="images/annotationHub.png" /> Displaying and filtering the Annotation Hub object in a browser</p>
<p>By default 1000 entries are displayed per page, we can change this using the filter on the top of the page or navigate through different pages using the page scrolling feature at the bottom of the page.</p>
<p>We can also select the rows of interest to us and send them back to the R session using ‘Return rows to R session’ button ; this sets a filter internally which filters the <code>AnnotationHub</code> object. The names of the selected AnnotationHub elements displayed at the top of the page.</p>
</div>
<div id="using-annotationhub-to-retrieve-data" class="section level1">
<h1>Using <code>AnnotationHub</code> to retrieve data</h1>
<p>Looking back at our chain file example, if we are interested in the file dm1ToDm2.over.chain.gz, we can gets its metadata using</p>
<pre class="r"><code>dm</code></pre>
<pre><code>## AnnotationHub with 45 records
## # snapshotDate(): 2022-04-21
## # $dataprovider: UCSC
## # $species: Drosophila melanogaster
## # $rdataclass: ChainFile
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["AH15102"]]'
##
## title
## AH15102 | dm3ToAnoGam1.over.chain.gz
## AH15103 | dm3ToApiMel3.over.chain.gz
## AH15104 | dm3ToDm2.over.chain.gz
## AH15105 | dm3ToDm6.over.chain.gz
## AH15106 | dm3ToDp3.over.chain.gz
## ... ...
## AH15142 | dm2ToDroVir3.over.chain.gz
## AH15143 | dm2ToDroWil1.over.chain.gz
## AH15144 | dm2ToDroYak1.over.chain.gz
## AH15145 | dm2ToDroYak2.over.chain.gz
## AH15146 | dm1ToDm2.over.chain.gz</code></pre>
<pre class="r"><code>dm["AH15146"]</code></pre>
<pre><code>## AnnotationHub with 1 record
## # snapshotDate(): 2022-04-21
## # names(): AH15146
## # $dataprovider: UCSC
## # $species: Drosophila melanogaster
## # $rdataclass: ChainFile
## # $rdatadateadded: 2014-12-15
## # $title: dm1ToDm2.over.chain.gz
## # $description: UCSC liftOver chain file from dm1 to dm2
## # $taxonomyid: 7227
## # $genome: dm1
## # $sourcetype: Chain
## # $sourceurl: http://hgdownload.cse.ucsc.edu/goldenpath/dm1/liftOver/dm1ToDm...
## # $sourcesize: NA
## # $tags: c("liftOver", "chain", "UCSC", "genome", "homology")
## # retrieve record with 'object[["AH15146"]]'</code></pre>
<p>We can download the file using</p>
<pre class="r"><code>dm[["AH15146"]]</code></pre>
<pre><code>## loading from cache</code></pre>
<pre><code>## require("rtracklayer")</code></pre>
<pre><code>## Chain of length 11
## names(11): chr2L chr2R chr3L chr3R chr4 chrX chrU chr2h chr3h chrXh chrYh</code></pre>
<p>Each file is retrieved from the AnnotationHub server and the file is also cache locally, so that the next time you need to retrieve it, it should download much more quickly.</p>
</div>
<div id="accessing-genome-scale-data" class="section level1">
<h1>Accessing Genome-Scale Data</h1>
<div id="non-model-organism-gene-annotations" class="section level2">
<h2>Non-model organism gene annotations</h2>
<p><em>Bioconductor</em> offers pre-built <code>org.*</code> annotation packages for model organisms, with their use described in the <a href="http://bioconductor.org/help/workflows/annotation/Annotation_Resources/#OrgDb">OrgDb</a> section of the Annotation work flow. Here we discover available <code>OrgDb</code> objects for less-model organisms.</p>
<p>The <code>query()</code> interface allows us to do full-text searching of <em>AnnotationHub</em> objects. How many OrgDb packages are represented in <em>AnnotationHub</em>?</p>
<pre class="r"><code>library(AnnotationHub)
## create the annotationhub object
ah <- AnnotationHub()</code></pre>
<pre><code>## snapshotDate(): 2022-04-21</code></pre>
<pre class="r"><code>## Query the annotationhub metadata
query(ah, "OrgDb")</code></pre>
<pre><code>## AnnotationHub with 1830 records
## # snapshotDate(): 2022-04-21
## # $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
## # $species: Escherichia coli, greater Indian_fruit_bat, Zootoca vivipara, Zo...
## # $rdataclass: OrgDb
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["AH100399"]]'
##
## title
## AH100399 | org.Ag.eg.db.sqlite
## AH100400 | org.At.tair.db.sqlite
## AH100401 | org.Bt.eg.db.sqlite
## AH100402 | org.Cf.eg.db.sqlite
## AH100403 | org.Gg.eg.db.sqlite
## ... ...
## AH102596 | org.Lobosporangium_transversale.eg.sqlite
## AH102597 | org.Sulfolobus_acidocaldarius.eg.sqlite
## AH102598 | org.Penicillium_rugulosum.eg.sqlite
## AH102599 | org.Talaromyces_rugulosus.eg.sqlite
## AH102600 | org.Metallosphaera_sedula.eg.sqlite</code></pre>
<p>Let’s assume that we are working with yeast (<em>Saccharomyces cerevisiae</em>) and get the OrgDb package.</p>
<pre class="r"><code>sub_ah = query(ah, c("OrgDb", "cerevisiae"))
sub_ah</code></pre>
<pre><code>## AnnotationHub with 1 record
## # snapshotDate(): 2022-04-21
## # names(): AH100415
## # $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
## # $species: Saccharomyces cerevisiae
## # $rdataclass: OrgDb
## # $rdatadateadded: 2022-04-18
## # $title: org.Sc.sgd.db.sqlite
## # $description: NCBI gene ID based annotations about Saccharomyces cerevisiae
## # $taxonomyid: 559292
## # $genome: NCBI genomes
## # $sourcetype: NCBI/ensembl
## # $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.ensembl.org/p...
## # $sourcesize: NA
## # $tags: c("NCBI", "Gene", "Annotation")
## # retrieve record with 'object[["AH100415"]]'</code></pre>
<pre class="r"><code>orgdb <- query(sub_ah, "OrgDb")[[1]]</code></pre>
<pre><code>## loading from cache</code></pre>
<pre><code>## Loading required package: AnnotationDbi</code></pre>
<pre><code>## Loading required package: Biobase</code></pre>
<pre><code>## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.</code></pre>
<pre><code>##
## Attaching package: 'Biobase'</code></pre>
<pre><code>## The following object is masked from 'package:AnnotationHub':
##
## cache</code></pre>
<p>Look at the <code>orgdb</code> object.</p>
<pre class="r"><code>orgdb</code></pre>
<pre><code>## OrgDb object:
## | DBSCHEMAVERSION: 2.1
## | Db type: OrgDb
## | Supporting package: AnnotationDbi
## | DBSCHEMA: YEAST_DB
## | ORGANISM: Saccharomyces cerevisiae
## | SPECIES: Yeast
## | YGSOURCENAME: Yeast Genome
## | YGSOURCEURL: http://sgd-archive.yeastgenome.org
## | YGSOURCEDATE: 2019-Oct25
## | CENTRALID: ORF
## | TAXID: 559292
## | KEGGSOURCENAME: KEGG GENOME
## | KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
## | KEGGSOURCEDATE: 2011-Mar15
## | GOSOURCENAME: Gene Ontology
## | GOSOURCEURL: http://current.geneontology.org/ontology/go-basic.obo
## | GOSOURCEDATE: 2022-03-10
## | EGSOURCEDATE: 2022-Mar17
## | EGSOURCENAME: Entrez Gene
## | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
## | ENSOURCEDATE: 2021-Dec21
## | ENSOURCENAME: Ensembl
## | ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
## | UPSOURCENAME: Uniprot
## | UPSOURCEURL: http://www.UniProt.org/
## | UPSOURCEDATE: Fri Apr 1 15:07:54 2022</code></pre>
<pre><code>##
## Please see: help('select') for usage information</code></pre>
<p>The <code>orgdb</code> object works like a little database. We can look at the columns in the database.</p>
<pre class="r"><code>columns(orgdb)</code></pre>
<pre><code>## [1] "ALIAS" "COMMON" "DESCRIPTION" "ENSEMBL" "ENSEMBLPROT"
## [6] "ENSEMBLTRANS" "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL"
## [11] "GENENAME" "GO" "GOALL" "INTERPRO" "ONTOLOGY"
## [16] "ONTOLOGYALL" "ORF" "PATH" "PFAM" "PMID"
## [21] "REFSEQ" "SGD" "SMART" "UNIPROT"</code></pre>
<p>Not all columns can always be used for “lookup”, though. We need to know which columns represent <code>keys</code> that we can use to retrieve data.</p>
<pre class="r"><code>keytypes(orgdb)</code></pre>
<pre><code>## [1] "ALIAS" "COMMON" "DESCRIPTION" "ENSEMBL" "ENSEMBLPROT"
## [6] "ENSEMBLTRANS" "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL"
## [11] "GENENAME" "GO" "GOALL" "INTERPRO" "ONTOLOGY"
## [16] "ONTOLOGYALL" "ORF" "PATH" "PFAM" "PMID"
## [21] "REFSEQ" "SGD" "SMART" "UNIPROT"</code></pre>
<p>You can examine the values available for lookup by using the <code>keys()</code> method.</p>
<pre class="r"><code>head(keys(orgdb, keytype="ORF"))</code></pre>
<pre><code>## [1] "AIP5" "ARS1001" "ARS1002" "ARS1003" "ARS1004" "ARS1005"</code></pre>
<pre class="r"><code>head(keys(orgdb, keytype="GO"))</code></pre>
<pre><code>## [1] "GO:0003674" "GO:0005199" "GO:0005575" "GO:0009277" "GO:0030437"
## [6] "GO:0031505"</code></pre>
<p>Notice that there are two columns that look useful for the <code>derisi</code> data–<code>ORF</code> and <code>GENENAME</code>. Let’s assume that we had not been given the gene name as part of the <code>derisi</code> data, but only the ORF. We can use the <code>OrgDb</code> <code>select()</code> interface to retrieve the gene names for these five ORF ids, just to get a sense of how <code>select()</code> works to get information from an <code>OrgDb</code> object.</p>
<pre class="r"><code>orfids = c("YAL001C", "YAL002W", "YAL003W", "YAL004W", "YAL005C")
select(orgdb, keys = orfids, columns = "GENENAME", keytype="ORF")</code></pre>
<pre><code>## 'select()' returned 1:1 mapping between keys and columns</code></pre>
<pre><code>## ORF SGD GENENAME
## 1 YAL001C S000000001 TFC3
## 2 YAL002W S000000002 VPS8
## 3 YAL003W S000000003 EFB1
## 4 YAL004W S000002136 <NA>
## 5 YAL005C S000000004 SSA1</code></pre>
<pre class="r"><code>select(orgdb, keys = orfids, columns = "GO", keytype="ORF")</code></pre>
<pre><code>## 'select()' returned 1:many mapping between keys and columns</code></pre>
<pre><code>## ORF SGD GO EVIDENCE ONTOLOGY
## 1 YAL001C S000000001 GO:0000127 IBA CC
## 2 YAL001C S000000001 GO:0000127 IDA CC
## 3 YAL001C S000000001 GO:0001002 IDA MF
## 4 YAL001C S000000001 GO:0001003 IDA MF
## 5 YAL001C S000000001 GO:0003677 IEA MF
## 6 YAL001C S000000001 GO:0005634 IEA CC
## 7 YAL001C S000000001 GO:0005739 HDA CC
## 8 YAL001C S000000001 GO:0005739 IEA CC
## 9 YAL001C S000000001 GO:0006383 IDA BP
## 10 YAL001C S000000001 GO:0006384 IBA BP
## 11 YAL001C S000000001 GO:0008301 IDA MF
## 12 YAL001C S000000001 GO:0042791 IBA BP
## 13 YAL001C S000000001 GO:0042791 IDA BP
## 14 YAL001C S000000001 GO:0042791 IMP BP
## 15 YAL001C S000000001 GO:0071168 IMP BP
## 16 YAL002W S000000002 GO:0005770 IBA CC
## 17 YAL002W S000000002 GO:0005770 IDA CC
## 18 YAL002W S000000002 GO:0005794 IEA CC
## 19 YAL002W S000000002 GO:0005795 IEA CC
## 20 YAL002W S000000002 GO:0006623 IBA BP
## 21 YAL002W S000000002 GO:0006623 IMP BP
## 22 YAL002W S000000002 GO:0006886 IEA BP
## 23 YAL002W S000000002 GO:0015031 IEA BP
## 24 YAL002W S000000002 GO:0016020 IDA CC
## 25 YAL002W S000000002 GO:0016192 IEA BP
## 26 YAL002W S000000002 GO:0030897 IBA CC
## 27 YAL002W S000000002 GO:0032511 IMP BP
## 28 YAL002W S000000002 GO:0033263 IDA CC
## 29 YAL002W S000000002 GO:0034058 IBA BP
## 30 YAL002W S000000002 GO:0043495 IMP MF
## 31 YAL002W S000000002 GO:0046872 IEA MF
## 32 YAL002W S000000002 GO:0051020 IDA MF
## 33 YAL002W S000000002 GO:0051020 IPI MF
## 34 YAL003W S000000003 GO:0003746 IEA MF
## 35 YAL003W S000000003 GO:0005085 IBA MF
## 36 YAL003W S000000003 GO:0005085 IDA MF
## 37 YAL003W S000000003 GO:0005829 IBA CC
## 38 YAL003W S000000003 GO:0005853 IEA CC
## 39 YAL003W S000000003 GO:0005853 IMP CC
## 40 YAL003W S000000003 GO:0006412 IEA BP
## 41 YAL003W S000000003 GO:0006414 IBA BP
## 42 YAL003W S000000003 GO:0006414 IEA BP
## 43 YAL003W S000000003 GO:0006414 IMP BP
## 44 YAL003W S000000003 GO:0006449 IGI BP
## 45 YAL003W S000000003 GO:0032232 IDA BP
## 46 YAL003W S000000003 GO:1990145 IMP BP
## 47 YAL004W S000002136 <NA> <NA> <NA>
## 48 YAL005C S000000004 GO:0000049 IDA MF
## 49 YAL005C S000000004 GO:0000166 IEA MF
## 50 YAL005C S000000004 GO:0000209 IDA BP
## 51 YAL005C S000000004 GO:0000329 IDA CC
## 52 YAL005C S000000004 GO:0002181 IMP BP
## 53 YAL005C S000000004 GO:0005524 IBA MF
## 54 YAL005C S000000004 GO:0005524 IEA MF
## 55 YAL005C S000000004 GO:0005576 IEA CC
## 56 YAL005C S000000004 GO:0005618 IEA CC
## 57 YAL005C S000000004 GO:0005634 HDA CC
## 58 YAL005C S000000004 GO:0005634 IBA CC
## 59 YAL005C S000000004 GO:0005634 IDA CC
## 60 YAL005C S000000004 GO:0005737 HDA CC
## 61 YAL005C S000000004 GO:0005737 IBA CC
## 62 YAL005C S000000004 GO:0005737 IDA CC
## 63 YAL005C S000000004 GO:0005737 IEA CC
## 64 YAL005C S000000004 GO:0005829 HDA CC
## 65 YAL005C S000000004 GO:0005829 IBA CC
## 66 YAL005C S000000004 GO:0005844 IDA CC
## 67 YAL005C S000000004 GO:0005886 HDA CC
## 68 YAL005C S000000004 GO:0005886 IBA CC
## 69 YAL005C S000000004 GO:0006457 IDA BP
## 70 YAL005C S000000004 GO:0006606 IDA BP
## 71 YAL005C S000000004 GO:0006606 IGI BP
## 72 YAL005C S000000004 GO:0006616 IBA BP
## 73 YAL005C S000000004 GO:0006616 IDA BP
## 74 YAL005C S000000004 GO:0006626 IMP BP
## 75 YAL005C S000000004 GO:0006986 IBA BP
## 76 YAL005C S000000004 GO:0009277 IDA CC
## 77 YAL005C S000000004 GO:0016192 IBA BP
## 78 YAL005C S000000004 GO:0016887 IBA MF
## 79 YAL005C S000000004 GO:0016887 IDA MF
## 80 YAL005C S000000004 GO:0031072 IBA MF
## 81 YAL005C S000000004 GO:0034620 IBA BP
## 82 YAL005C S000000004 GO:0035617 IDA BP
## 83 YAL005C S000000004 GO:0042026 IBA BP
## 84 YAL005C S000000004 GO:0042026 IDA BP
## 85 YAL005C S000000004 GO:0043161 IGI BP
## 86 YAL005C S000000004 GO:0043161 IMP BP
## 87 YAL005C S000000004 GO:0044183 IBA MF
## 88 YAL005C S000000004 GO:0051082 IBA MF
## 89 YAL005C S000000004 GO:0051082 IDA MF
## 90 YAL005C S000000004 GO:0051085 IBA BP
## 91 YAL005C S000000004 GO:0051787 IBA MF
## 92 YAL005C S000000004 GO:0071470 IDA BP
## 93 YAL005C S000000004 GO:0072318 IDA BP
## 94 YAL005C S000000004 GO:0072671 IMP BP
## 95 YAL005C S000000004 GO:0090344 IMP BP</code></pre>
<p><em>Exercise:</em> Use the <code>derisi</code> ORF column to look up the associated gene names.</p>
<p><em>Exercise:</em> Use the <code>derisi</code> ORF column to look up the gene ontology <a href="http://geneontology.org/"><code>GO</code></a> annotations for the ORFs. What is the general relationship between <code>GO</code> terms and ORFs?</p>
<p><em>Bonus Exercise:</em> Read the documentation associated with the GO.db package <a href="https://bioconductor.org/packages/release/bioc/vignettes/AnnotationDbi/inst/doc/IntroToAnnotationPackages.pdf">here</a>. Use the GO.db package and the select interface to look up the descriptions and terms for the gene ontology terms from the previous exercise.</p>
</div>
<div id="roadmap-epigenomics-project" class="section level2">
<h2>Roadmap Epigenomics Project</h2>
<p>All Roadmap Epigenomics files are hosted <a href="http://egg2.wustl.edu/roadmap/data/byFileType/">here</a>. If one had to download these files on their own, one would navigate through the web interface to find useful files, then use something like the following <em>R</em> code.</p>
<pre class="r"><code>url <- "http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidated/broadPeak/E001-H3K4me1.broadPeak.gz"
filename <- basename(url)
download.file(url, destfile=filename)
if (file.exists(filename))
data <- import(filename, format="bed")</code></pre>
<p>This would have to be repeated for all files, and the onus would lie on the user to identify, download, import, and manage the local disk location of these files.</p>
<p><em><a href="https://bioconductor.org/packages/3.15/AnnotationHub">AnnotationHub</a></em> reduces this task to just a few lines of <em>R</em> code</p>
<pre class="r"><code>library(AnnotationHub)
ah = AnnotationHub()</code></pre>
<pre><code>## snapshotDate(): 2022-04-21</code></pre>
<pre class="r"><code>epiFiles <- query(ah, "EpigenomeRoadMap")</code></pre>
<p>A look at the value returned by <code>epiFiles</code> shows us that 18248 roadmap resources are available via <em><a href="https://bioconductor.org/packages/3.15/AnnotationHub">AnnotationHub</a></em>. Additional information about the files is also available, e.g., where the files came from (dataprovider), genome, species, sourceurl, sourcetypes.</p>
<pre class="r"><code>epiFiles</code></pre>
<pre><code>## AnnotationHub with 18248 records
## # snapshotDate(): 2022-04-21
## # $dataprovider: BroadInstitute
## # $species: Homo sapiens
## # $rdataclass: BigWigFile, GRanges, data.frame
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["AH28856"]]'
##
## title
## AH28856 | E001-H3K4me1.broadPeak.gz
## AH28857 | E001-H3K4me3.broadPeak.gz
## AH28858 | E001-H3K9ac.broadPeak.gz
## AH28859 | E001-H3K9me3.broadPeak.gz
## AH28860 | E001-H3K27me3.broadPeak.gz
## ... ...
## AH49540 | E058_mCRF_FractionalMethylation.bigwig
## AH49541 | E059_mCRF_FractionalMethylation.bigwig
## AH49542 | E061_mCRF_FractionalMethylation.bigwig
## AH49543 | E081_mCRF_FractionalMethylation.bigwig
## AH49544 | E082_mCRF_FractionalMethylation.bigwig</code></pre>
<p>A good sanity check to ensure that we have files only from the Roadmap Epigenomics project is to check that all the files in the returned smaller hub object come from <em>Homo sapiens</em> and the hg19 genome</p>
<pre class="r"><code>unique(epiFiles$species)</code></pre>
<pre><code>## [1] "Homo sapiens"</code></pre>
<pre class="r"><code>unique(epiFiles$genome)</code></pre>
<pre><code>## [1] "hg19"</code></pre>
<p>Broadly, one can get an idea of the different files from this project looking at the sourcetype</p>
<pre class="r"><code>table(epiFiles$sourcetype)</code></pre>
<pre><code>##
## BED BigWig GTF tab Zip
## 8298 9932 3 1 14</code></pre>
<p>To get a more descriptive idea of these different files one can use:</p>
<pre class="r"><code>sort(table(epiFiles$description), decreasing=TRUE)</code></pre>
<pre><code>##
## Bigwig File containing -log10(p-value) signal tracks from EpigenomeRoadMap Project
## 6881
## Bigwig File containing fold enrichment signal tracks from EpigenomeRoadMap Project
## 2947
## Narrow ChIP-seq peaks for consolidated epigenomes from EpigenomeRoadMap Project
## 2894
## Broad ChIP-seq peaks for consolidated epigenomes from EpigenomeRoadMap Project
## 2534
## Gapped ChIP-seq peaks for consolidated epigenomes from EpigenomeRoadMap Project
## 2534
## Narrow DNasePeaks for consolidated epigenomes from EpigenomeRoadMap Project
## 131
## 15 state chromatin segmentations from EpigenomeRoadMap Project
## 127
## Broad domains on enrichment for DNase-seq for consolidated epigenomes from EpigenomeRoadMap Project
## 78
## RRBS fractional methylation calls from EpigenomeRoadMap Project
## 51
## Whole genome bisulphite fractional methylation calls from EpigenomeRoadMap Project
## 37
## MeDIP/MRE(mCRF) fractional methylation calls from EpigenomeRoadMap Project
## 16
## GencodeV10 gene/transcript coordinates and annotations corresponding to hg19 version of the human genome
## 3
## RNA-seq read count matrix for intronic protein-coding RNA elements
## 2
## RNA-seq read counts matrix for ribosomal gene exons
## 2
## RPKM expression matrix for ribosomal gene exons
## 2
## Metadata for EpigenomeRoadMap Project
## 1
## RNA-seq read counts matrix for non-coding RNAs
## 1
## RNA-seq read counts matrix for protein coding exons
## 1
## RNA-seq read counts matrix for protein coding genes
## 1
## RNA-seq read counts matrix for ribosomal genes
## 1
## RPKM expression matrix for non-coding RNAs
## 1
## RPKM expression matrix for protein coding exons