-
Notifications
You must be signed in to change notification settings - Fork 0
/
Part2_Phyloseq_moretrim (1).Rmd
1015 lines (752 loc) · 51.7 KB
/
Part2_Phyloseq_moretrim (1).Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "5XFAD Microbiome Part2 More Trim"
author: "Madelyn Houser"
date: "9/22/2020"
output:
html_document:
toc: true
word_document:
toc: true
---
```{r packages}
library("tidyverse")
packageVersion("tidyverse")
library("phyloseq")
packageVersion("phyloseq")
library("decontam")
packageVersion("decontam")
library("Hmisc")
packageVersion("Hmisc")
library("corrplot")
packageVersion("corrplot")
library("ggpubr")
packageVersion("ggpubr")
library("DESeq2")
packageVersion("DESeq2")
library("apeglm")
packageVersion("apeglm")
```
```{r setseed}
# Set seed so that results are fully reproducible.
set.seed(100)
```
# Identify and Remove Contaminants
## Check library sizes
We would like to remove sequence data associated with any contaminants in the samples, and these can be identified based on presence in the negative control samples and on their frequency in samples with different library sizes.
```{r librarysize, fig.height=2, fig.width=4}
# Load phyloseq object
ps <- readRDS("F:/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/phyloseqobject_moretrim.RDS")
# Look at library sizes for each sample
sample_sums(ps)
samdf <- as.data.frame(sample_data(ps))
samdf
samdf$Library <- sample_sums(ps)
samdf <- samdf[order(samdf$Library),]
samdf$Index <- seq(nrow(samdf))
ggplot(data=samdf, aes(x=Index, y=Library, color=is.ctrl)) + geom_point()
```
*The negative control samples have few reads compared with the postive controls and the true samples, as expected, with the exception of the one sample identified in the first part of the analysis as being of poor quality. The library sizes for the true samples range from 95,186 to 203,676, a 2.14X size difference.*
## Identify contaminants - frequency method
We will use the ```decontam``` package to identify contaminants.
```{r decontamFreq}
# Remove positive control samples from phyloseq object
ps1 <- ps %>% subset_samples(Subject != "Positive-1") %>% subset_samples(Subject != "Positive-2")
# Identify contaminants using the frequency method (contaminant sequences more frequent in samples with less DNA to start with)
contamdf.freq <- isContaminant(ps1, method="frequency", conc="Concentration", neg="is.neg", threshold=0.1)
table(contamdf.freq$contaminant) # Number of contaminants identified (true) by this method
which(contamdf.freq$contaminant) # The sequence numbers identified as contaminants by this method
# Compare the relationship between DNA quantity and frequency for sequences identified as true vs. contaminants
# Sequences 1-7 were not identified as contaminants while the other sequences were.
plot_frequency(ps1, taxa_names(ps1)[c(1, 2, 3, 4, 5, 6, 7)], conc="Concentration", neg="is.neg") + xlab("DNA Concentration") + ggtitle("True Sequences")
plot_frequency(ps1, taxa_names(ps1)[which(contamdf.freq$contaminant)], conc="Concentration", neg="is.neg") + xlab("DNA Concentration") + ggtitle("Identified Contaminants")
```
*The frequency method identified 13 sequences as contaminants. These sequences were not detected in the majority of samples, not detected in the negative control samples, and the linear relationships are not very convincing.*
## Identify contaminants - prevalence method
We can also identify contaminants based on their presence in the negative control samples.
```{r decontamPrev}
contamdf.prev <- isContaminant(ps1, method="prevalence", neg="is.neg")
table(contamdf.prev$contaminant)
which(contamdf.prev$contaminant)
# We can visualize the prevalence in negative control and true samples
ps.pa <- transform_sample_counts(ps1, function(abund) 1*(abund>0))
ps.pa.neg <- prune_samples(sample_data(ps.pa)$is.neg == "TRUE", ps.pa)
ps.pa.pos <- prune_samples(sample_data(ps.pa)$is.neg == "FALSE", ps.pa)
df.pa <- data.frame(pa.pos=taxa_sums(ps.pa.pos), pa.neg=taxa_sums(ps.pa.neg), contaminant=contamdf.prev$contaminant)
ggplot(data=df.pa, aes(x=pa.neg, y=pa.pos, color=contaminant)) +
geom_point() +
xlab("Prevalence (Negative Controls)") +
ylab("Prevalence (True Samples)")
# Compare the relationship between DNA quantity and frequency for sequences identified as true vs. contaminants by the prevalence method
plot_frequency(ps1, taxa_names(ps1)[which(contamdf.prev$contaminant)], conc="Concentration") + xlab("DNA Concentration") + ggtitle("Identified Contaminants")
```
*The prevalence method with standard threshold (0.1) identified 10 sequences as contaminants, and it identified those that were abundant in negative controls and rarely identified in samples. There was no overlap in sequences identified between methods.*
## Check decontam thresholds
```{r decontamThresh}
# Graph the decontam scores (lower scores indicate likely contaminants) for sequences found in 3 or more samples
contamdf <- rbind(cbind(contamdf.freq, Method="Frequency"), cbind(contamdf.prev, Method="Prevalence")) %>% filter(prev >=3)
contamdf$Prevalence <- cut(contamdf$prev, c(0, 3, 6, 10, 9999), labels=c("3", "4-6", "7-10", "11+"))
ggplot(contamdf, aes(p, fill=Prevalence)) +
geom_histogram(binwidth=0.01) +
labs(x = "decontam Score", y="Number ASVs") +
facet_wrap(~Method)
```
*These graphs confirm that the number of likely contaminants is very low. The default threshold (decontam score<0.1) is likely to be appropriate. We will remove the contaminants identified by the prevalence method from the data.*
## Remove contaminants
```{r decontamRemove}
# Remove contaminants identified by the prevalence method from the data
psdecon <- prune_taxa(!contamdf.prev$contaminant, ps)
psdecon
rm(contamdf, contamdf.freq, contamdf.prev, df.pa, ps.pa, ps.pa.neg, ps.pa.pos)
detach("package:decontam", unload=TRUE)
```
# Alpha Diversity
For our diversity measurements, we want to remove the positive and negative control samples from the dataset.
```{r prune}
# Remove negative and positive control samples
sample_names(psdecon)
samdfdecon <- sample_data(psdecon)
ps2 <- prune_samples(!samdfdecon$is.ctrl, psdecon)
# Check read counts of the true samples
sample_sums(ps2)
# We see that 359-1-25-18 has a low number of reads compared with the other samples. This was the sample that had a much smaller percentage of reads passing the original filtering thresholds. We will remove it.
# Remove poor quality sample
ps3 <- prune_samples(sample_names(ps2) != "359-1-25-18", ps2)
ps3
sample_names(ps3)
```
We'll add the alpha diversity values to our sample data.
```{r alphadivdf}
# Get alpha diversity values for each sample
ad <- data.frame(estimate_richness(psdecon, measures=c("Shannon", "InvSimpson")))
ad$Subject <- rownames(ad)
# Add alpha diversity values to sample data
samdfdecon <- data.frame(samdfdecon)
samdfdecon <- merge(samdfdecon, ad)
write_csv(samdfdecon, "F:/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/SampleDataAlphaDiv.csv")
rownames(samdfdecon) <- samdfdecon$Subject
samdfdecon <- sample_data(samdfdecon)
psdecon <- merge_phyloseq(psdecon, samdfdecon)
# Remove negative and positive control samples
ps2 <- prune_samples(!samdfdecon$is.ctrl, psdecon)
# Remove poor quality sample
ps3 <- prune_samples(sample_names(ps2) != "359-1-25-18", ps2)
```
**Then we'll plot alpha diversity.**
```{r alphdivmergeplot}
df <- data.frame(sample_data(ps3))
ggplot(df, aes(x=Diet, y=Shannon, fill=Treatment)) +
scale_shape_manual(values=c(21, 24)) +
geom_jitter(aes(fill=Treatment, shape=Genotype), position=position_jitterdodge(jitter.width=0.05, dodge.width=0.9), size=3) +
scale_fill_manual(values=c("grey47", "white")) +
guides(fill = guide_legend(override.aes = list(shape = 21))) +
theme_classic() +
theme(text=element_text(family="Arial", size=12), axis.title.x = element_blank()) +
stat_summary(
fun.ymax=function(i) mean(i) + (sd(i)/sqrt(length(i))),
fun.ymin=function(i) mean(i) - (sd(i)/sqrt(length(i))),
geom="errorbar", width=0.2, position=position_dodge(width=0.9))
```
```{r alphadiv, fig.width=6, fig.height=8}
# Plot alpha-diversity
SI <- ggplot(df, aes(x=Treatment, y=Shannon)) +
geom_jitter(aes(shape=Genotype, color=Treatment), size=3, width=0.2) +
theme_bw() +
scale_color_grey(start = 0, end = 0.8) +
ggtitle("Alpha Diversity - Shannon Index") +
facet_wrap(~ Diet) +
stat_compare_means(method=t.test, label.x=1.4)
ISI <- ggplot(df, aes(x=Treatment, y=InvSimpson)) +
geom_jitter(aes(shape=Genotype, color=Treatment), size=3, width=0.2) +
theme_bw() +
scale_color_grey(start = 0, end = 0.8) +
ggtitle("Alpha Diversity - Inverse Simpson Index") +
facet_wrap(~ Diet) +
stat_compare_means(method=t.test, label.x=1.4)
ggarrange(SI, ISI, ncol=1, nrow=2)
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByTreatment/AlphaDiversityByTreatment.pdf")
```
### Results: Alpha diversity and treatment
#### *Alpha diversity reflects the richness and evenness of taxa in the environment. More diverse populations have higher alpha diversity indices. The Shannon index is influenced by rare taxa while the inverse Simpson index typically reflects the contribution of dominant taxa. There is a significant reduction (Wilcoxon rank-sum test) in both of these indices with XPro treatment in control diet as well as HFHF diet mice, indicating a reduction in bacterial diversity. There appear to be few differences in alpha diversity by genotype with the possible exception of XPro-treated control diet mice, in which Tg mice had lower alpha diversity values than Non-Tg. There are, however, very few mice in this category (x=2 Non-Tg, x=3 Tg), so this could be a chance occurrence.*
```{r alphadiv2, fig.width=6, fig.height=8}
# Plot alpha-diversity
SI2 <- ggplot(df, aes(x=Diet, y=Shannon)) +
geom_jitter(aes(shape=Genotype, color=Diet), size=3, width=0.2) +
theme_bw() +
ggtitle("Alpha Diversity - Shannon Index") +
facet_wrap(~ Treatment) +
stat_compare_means(method=t.test, label.x=1.4)
ISI2 <- ggplot(df, aes(x=Diet, y=InvSimpson)) +
geom_jitter(aes(shape=Genotype, color=Diet), size=3, width=0.2) +
theme_bw() +
ggtitle("Alpha Diversity - Inverse Simpson Index") +
facet_wrap(~ Treatment) +
stat_compare_means(method=t.test, label.x=1.4)
ggarrange(SI2, ISI2, ncol=1, nrow=2)
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByDiet/AlphaDiversityByDiet.pdf")
```
### Results: Alpha diversity and diet
#### *We confirm that diet does not produce significant changes in alpha diversity in these mice.*
## Alpha diversity regression
We can evaluate factors influencing alpha diversity another way, using a linear regression.
```{r lmad}
summary(lm(Shannon ~ Treatment + Diet + Genotype, df))
summary(lm(InvSimpson ~ Treatment + Diet + Genotype, df))
```
### Results: Alpha diversity and all factors
#### *This analysis evaluates which factors are associated with the alpha diversity measures and finds that XPro treatment reduces alpha diversity significantly when diet and genotype are taken into account, and that diet and genotype are not associated with alpha diversity.*
```{r endad}
# Clean up objects not needed for further analyses
rm(ad, ISI, ISI2, ps, SI, SI2)
```
# Beta Diversity
**We will also evaluate community dissimilarity among the samples using beta diversity measures (i.e. how distinct are the bacterial communities in different samples?).**
## Beta diversity - Bray-Curtis method
*The Bray-Curtis dissimilarity method evaluates beta diversity based on read counts.*
```{r ord}
# transform data to proportions to account for differences in sampling depth/library size
ps.prop <- transform_sample_counts(ps3, function(otu) otu/sum(otu))
```
```{r bray, fig.width=6}
# Ordinate with Bray-Curtis method using non-metric multidimensional scaling
# This metric takes abundance into account
ord.nmds.bray <- ordinate(ps.prop, method="NMDS", distance="bray")
ord.nmds.bray
# 20 iterations were sufficient to reach a convergent solution, and the stress is below 0.2 (at or above would be considered suspect), so two dimensions should be sufficient
# Plot ordination
plot_ordination(ps3, ord.nmds.bray, color="Diet", shape="Genotype", title="Bray-Curtis") +
scale_shape_manual(values=c(21, 24)) +
scale_color_manual(values=c("#E69F00", "#56B4E9")) +
geom_point(aes(fill=Treatment), size=3, stroke=1.5) +
scale_fill_manual(values=c("grey47", "white")) +
guides(fill = guide_legend(override.aes = list(shape = 21)))
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/BetaDiversity_BrayCurtis.pdf")
# Test for significant differences by factor
BC.dist <- vegan::vegdist(otu_table(ps.prop), method="bray")
vegan::adonis2(BC.dist ~ Treatment + Diet + Genotype, data=df, permutations=999, method="bray", by="margin")
```
### Results: Beta diversity - Bray Curtis
#### *It is apparent visually from the plot and statistically from the permutation test that there are significant differences in bacterial communities based on XPro treatment and based on diet. Genotype does not have a significant effect on beta diversity.*
## Beta diversity - UniFrac method
*The UniFrac (unique fraction) metric evaluates beta diversity based on the phylogenetic relatedness of bacterial taxa, and it does so based on the presence/absence of taxa. It relies on the phylogenetic tree generated in Part 1 of this analysis.*
```{r Unifrac, fig.width=6}
# Set seed for reproducibility (selected based on ordination converging within 20 iterations)
set.seed(5)
# Ordinate with unweighted UniFrac method using non-metric multidimensional scaling
ord.nmds.unifrac <- ordinate(ps.prop, method="NMDS", distance="unifrac")
ord.nmds.unifrac
# 20 iterations were not sufficient to reach a convergent solution, but the stress is below 0.2 and in line with other ordination methods in this analysis, so two dimensions are sufficient.
# Plot ordination
plot_ordination(ps3, ord.nmds.unifrac, color="Diet", shape="Genotype", title="Unweighted UniFrac", axes=c(1,2)) +
scale_shape_manual(values=c(21, 24)) +
scale_color_manual(values=c("#E69F00", "#56B4E9")) +
geom_point(aes(fill=Treatment), size=3, stroke=1.5) +
scale_fill_manual(values=c("grey47", "white")) +
guides(fill = guide_legend(override.aes = list(shape = 21)))
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/BetaDiversity_UniFrac.pdf")
# Test for significant differences by factor
Uni.dist <- UniFrac(ps.prop)
vegan::adonis2(Uni.dist ~ Treatment + Diet + Genotype, data=df, permutations=999, method="unifrac", by="margin")
```
### Results: Beta diversity - UniFrac
#### *It is clear that the bacterial communities in different experimental groups are not as dissimilar based on the presence/absence of related taxa as they are based on sequence counts. There is less separation between XPro-treated and saline-treated mice and between control and HFHF diet mice by this metric than there was by the Bray-Curtis method. Because UniFrac is based on presence/absence of taxa, it is sensitive to changes in low-abundance taxa, so the lack of separation by this method suggests that the differences observed by the Bray-Curtis method were likely driven predominantly by abundant taxa. We still find that there is a significant difference in the bacterial communities of XPro- and saline-treated mice by the UniFrac method, but no differences based on diet or genotype.*
## Beta diversity - Weighted UniFrac method
*The weighted UniFrac metric evaluates beta diversity based on the phylogenetic relatedness of bacterial taxa, and it takes into account the relative abundances of the taxa. It relies on the phylogenetic tree generated in Part 1 of this analysis.*
```{r wUnifrac, fig.width=6}
# Set seed for reproducibility
set.seed(100)
# Ordinate with weighted UniFrac method using non-metric multidimensional scaling
ord.nmds.wunifrac <- ordinate(ps.prop, method="NMDS", distance="wunifrac")
ord.nmds.wunifrac
# 20 iterations were sufficient to reach a convergent solution, and the stress is below 0.2 (at or above would be considered suspect), so two dimensions are sufficient.
# Plot ordination
plot_ordination(ps3, ord.nmds.wunifrac, color="Diet", shape="Genotype", title="Weighted UniFrac") +
scale_shape_manual(values=c(21, 24)) +
scale_color_manual(values=c("#E69F00", "#56B4E9")) +
geom_point(aes(fill=Treatment), size=3, stroke=1.5) +
scale_fill_manual(values=c("grey47", "white")) +
guides(fill = guide_legend(override.aes = list(shape = 21)))
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/BetaDiversity_WeightedUniFrac.pdf")
# Test for significant differences by factor
wUni.dist <- UniFrac(ps.prop, weighted=TRUE, normalized=FALSE)
vegan::adonis2(wUni.dist ~ Treatment + Diet + Genotype, data=df, permutations=999, method="wunifrac", by="margin")
```
### Results: Beta diversity - Weighted UniFrac
#### *There is clear clustering of bacterial communities according to treatment and diet, but not genotype, and this is confirmed statistically. Because weighted uniFrac takes into account relative abundances, this helps to confirm that the differences associated with diet and treatment are being driven by abundant taxa.*
```{r bdcleanup}
rm(ord.nmds.bray, ord.nmds.unifrac, ord.nmds.wunifrac, BC.dist, Uni.dist, wUni.dist)
```
# Specific Bacterial Taxa
## Most abundant OTUs
**We can visualize the most abundant OTUs in our samples at different taxonomic levels.**
```{r otufamily, fig.height=8, fig.width=10}
# Create bar plots for most abundant OTUs at the family level
# Because the previous analyses found no significant effects of genotype, we will separate only diet and treatment groups
top <- names(sort(taxa_sums(ps3), decreasing=TRUE))[1:19]
ps.top <- prune_taxa(top, ps3)
pf <- plot_bar(ps.top, x="Subject", fill="Family") + facet_grid(~Treatment+Diet, scales="free_x") +
xlab(label="Subject") +
labs(title="Most Abundant OTUs") +
theme(plot.title = element_text(hjust=0.5)) +
geom_bar(aes(fill=Family), stat="identity", position="stack")
pf
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/MostAbundantFamilies.pdf")
```
```{r otugenus, fig.height=8, fig.width=10}
topg <- names(sort(taxa_sums(ps.prop), decreasing=TRUE))[1:10]
ps.topg <- prune_taxa(top, ps.prop)
# Many bacteria could not be idenfified at the genus level, but this is not very informative for our graph. We can remove the NA genus for graphing
topgtax <- data.frame(tax_table(ps.top))
ps.topg <- prune_taxa(topgtax$Genus != "NA", ps.top)
# Create bar plots for top OTUs at the genus level (other than NA)
pg <- plot_bar(ps.topg, x="Subject", fill="Genus") +
facet_grid(~Treatment+Diet, scales="free_x") +
xlab(label="Subject") +
labs(title="Most Abundant OTUs") +
theme(plot.title = element_text(hjust=0.5)) +
geom_bar(aes(fill=Genus), stat="identity", position="stack")
pg
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/MostAbundantGenera.pdf")
```
### Results: Most Abundant OTUs
#### *Visual evaluation of these plots suggests that* Bacteroides *taxa are far more abundant in XPro-treated mice.*
## Differential Abundance (DESeq2)
**We can determine whether the relative abundances of individual taxa differ by experimental group. We will utilize the ```DESeq2``` package in accordance with the practices outlined in McMurdie and Holmes et al 2014 (PMID: 24699258).**
### Differential Abundance by Treatment
```{r deseqdataset}
# Remove taxa with counts of 5 or fewer in the whole data set to avoid making too many comparisons which aren't really meaningful.
ps3p <- prune_taxa(taxa_sums(ps3) > 5, ps3)
# Ensure that Treatment is a factor with two levels, Saline and XPro, and that XPro is considered the "treatment" for reporting fold changes
# Ensure that Diet is a factor with two levels, CD and HFHF, in that order.
sample_data(ps3p)$Treatment <- factor(sample_data(ps3p)$Treatment, levels=c("Saline", "XPro"))
sample_data(ps3p)$Diet <- factor(sample_data(ps3p)$Diet, levels=c("CD", "HFHS"))
# Convert phyloseq object to deseq data set
# The last variable in the formula is the one that will be used to calculate fold changes
# Variables to the left of the last one will be taken into account when calculating effects of the primary variable
# Again, we will leave out genotype due to limited representation of both genotypes in all experimental groups and lack of significant effects of genotype found earlier in the analysis
dds <- phyloseq_to_deseq2(ps3p, ~ Diet + Treatment)
# Ensure that Saline is considered the reference level for reporting fold changes
dds$Treatment <- relevel(dds$Treatment, ref = "Saline")
dds$Diet <- relevel(dds$Treatment, ref = "CD")
dds
```
```{r deseq2}
# Differential abundance testing
dsdds <- DESeq(dds)
# Generate results
res = results(dsdds, alpha=0.05)
# Add annotation from taxonomy table
resan <- cbind(as.data.frame(res), as.matrix(tax_table(ps3p)[rownames(res),]))
# Sort by p-value
resan <- resan[order(resan$padj),]
# Remove row names
resan2 <- resan
rownames(resan2) = NULL
# Select taxa with padj < 0.05
sigtab <- resan2[which(resan2$padj < 0.05),]
dim(sigtab)
summary(sigtab)
# Export table of significant results
write.csv(sigtab, "C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByTreatment/DESEQ2_TaxaSig05_Treatment_accntDiet.csv")
```
*We will visualize the changes in the taxa identified as differing significantly in abundance with XPro treatment. Each subdivision of a bar represents a different OTU (asv) within the genus or family, and the plots are ordered top to bottom according to the adjusted p value with the lowest at the top of the plot.*
```{r deseq2barplotgenus, fig.height=9, fig.width=8}
# Reverse order the taxa according to adjusted p-value so they will be in the right order in the flipped plot
sigtab$Genus <- as.character(sigtab$Genus)
sigtab$Genus <- factor(sigtab$Genus, levels=unique(sigtab$Genus))
sigtab$Genus <- fct_rev(sigtab$Genus)
# Plot the log2 fold changes in taxa which differ significantly between the treatments.
ggplot(sigtab[!is.na(sigtab$Genus),], aes(x=Genus, y=log2FoldChange, fill=Genus)) +
geom_bar(stat="identity", color="black") +
coord_flip() +
ggtitle("Differentially Abundant Bacteria by Treatment") +
theme(plot.title = element_text(hjust = 0.5)) +
theme(legend.position="none") +
geom_hline(yintercept=0, color="black", size=1)
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByTreatment/DESEQ2_TaxaSig05Genus_Treatment_accntDiet_barplot.pdf", width=8, height=9)
```
*We can plot by family as well. This is not quite as informative as plotting by genus because often, taxa within a family change in abundance in both directions by treatment. We will include it, however, because there are several taxa for which a genus could not be identified, and those are left off of the image plotted by genus.*
```{r deseq2barplotfamily, fig.height=5.5, fig.width=8}
# Reverse order the taxa according to adjusted p-value so they will be in the right order in the flipped plot
sigtab$Family <- as.character(sigtab$Family)
sigtab$Family <- factor(sigtab$Family, levels=unique(sigtab$Family))
sigtab$Family <- fct_rev(sigtab$Family)
# Plot the log2 fold changes in taxa which differ significantly between the treatments.
ggplot(sigtab[!is.na(sigtab$Family),], aes(x=Family, y=log2FoldChange, fill=Family)) +
geom_bar(stat="identity", color="black") +
coord_flip() +
ggtitle("Differentially Abundant Bacteria by Treatment") +
theme(plot.title = element_text(hjust = 0.5)) +
theme(legend.position="none") +
geom_hline(yintercept=0, color="black", size=1)
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByTreatment/DESEQ2_TaxaSig05Family_Treatment_accntDiet_barplot.pdf", width=8, height=5.5)
```
*Let's hone in on some of those phyla with significant changes in both directions using a phylogenetic tree. Each point beside a limb tip represents an individual subject. This will provide us with a view of the phylogenetic relatedness of particular OTUs within each family.*
```{r deseq2tree, fig.width=13, fig.height=9}
# Let's prune our phyloseq object to the taxa identified by differential abundance testing
ps4 <- prune_taxa(taxa_names(ps3p) %in% rownames(resan[which(resan$padj < 0.05),]), ps3p)
ps4
# transform data to proportions to show relative abundance
ps4.prop <- transform_sample_counts(ps4, function(otu) otu/sum(otu))
ps4Bac <- subset_taxa(ps4.prop, Family=="Bacteroidaceae")
plot_tree(ps4Bac, shape="Treatment", color="Diet", size="abundance", label.tips="Species", base.spacing=0.03, title="Bacteroidaceae Family (Genus Bacteroides)")
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByTreatment/DESEQ2_Bacteroidaceae_Treatmentphytree.pdf", width=8, height=4)
ps4Rum <- subset_taxa(ps4.prop, Family=="Ruminococcaceae")
plot_tree(ps4Rum, shape="Treatment", color="Diet", size="abundance", label.tips="Genus", base.spacing=0.02, title="Ruminococcaceae Family ")
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByTreatment/DESEQ2_Ruminococcaceae_Treatmentphytree.pdf", width=13, height=7)
ps4Mur <- subset_taxa(ps4.prop, Family=="Muribaculaceae")
plot_tree(ps4Mur, shape="Treatment", color="Diet", size="abundance", label.tips="Genus", base.spacing=0.1, title="Muribaculaceae Family ")
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByTreatment/DESEQ2_Muribaculaceae_Treatmentphytree.pdf", width=8, height=5)
ps4Lach <- subset_taxa(ps4.prop, Family=="Lachnospiraceae")
plot_tree(ps4Lach, shape="Treatment", color="Diet", size="abundance", label.tips="Genus", base.spacing=0.08, title="Lachnospiraceae Family")
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByTreatment/DESEQ2_Lachnospiraceae_Treatmentphytree.pdf", width=8, height=9)
ps4Clost <- subset_taxa(ps4.prop, Family=="Clostridiales_vadinBB60_group")
plot_tree(ps4Clost, shape="Treatment", color="Diet", size="abundance", base.spacing=0.02, title="Clostridiales_vadinBB60_group Family")
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByTreatment/DESEQ2_ClostridialesBB60_Treatmentphytree.pdf", width=8, height=9)
```
*We will plot the relative abundance of individual genera identified as differing significantly with XPro treatment (with diet accounted for). Each subdivision of the bar represents a different sample, and the height of each subdivision represents the relative abundance of the genus within that sample.*
```{r deseq2barplotv}
# Agglomerate taxa by genus
ps4g <- tax_glom(ps4.prop, taxrank=rank_names(ps4.prop)[6])
# Because there are so many taxa to plot, separate them to create images of reasonable size
# Because Bacteroides are so abundant compared to everything else, they get an image all to themselves
siggensort <- names(sort(taxa_sums(ps4g), decreasing=TRUE))
most1 <- siggensort[1]
more10 <- siggensort[2:11]
less11 <- siggensort[12:22]
least11 <- siggensort[23:33]
ps4g1 <- prune_taxa(most1, ps4g)
ps4g2 <- prune_taxa(more10, ps4g)
ps4g3 <- prune_taxa(less11, ps4g)
ps4g4 <- prune_taxa(least11, ps4g)
# Plot the relative abundance of the taxa
plot_bar(ps4g1, "Treatment", fill="Diet") +
facet_grid(~Genus) +
ggtitle("Differentially Abundant Bacteria") +
theme(plot.title = element_text(hjust = 0.5), legend.position="right")
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByTreatment/DESEQ2_Bacteroides_Treatment_accntDiet_barplotvert.pdf", height=5, width=3)
plot_bar(ps4g2, "Treatment", fill="Diet") +
facet_grid(~Genus) +
ggtitle("Differentially Abundant Bacteria") +
theme(plot.title = element_text(hjust = 0.5), legend.position="right")
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByTreatment/DESEQ2_Genus2_Treatment_accntDiet_barplotvert.pdf", height=5, width=23)
plot_bar(ps4g3, "Treatment", fill="Diet") +
facet_grid(~Genus) +
ggtitle("Differentially Abundant Bacteria") +
theme(plot.title = element_text(hjust = 0.5), legend.position="right")
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByTreatment/DESEQ2_Genus3_Treatment_accntDiet_barplotvert.pdf", height=5, width=23)
plot_bar(ps4g4, "Treatment", fill="Diet") +
facet_grid(~Genus) +
ggtitle("Differentially Abundant Bacteria") +
theme(plot.title = element_text(hjust = 0.5), legend.position="right")
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByTreatment/DESEQ2_Genus4_Treatment_accntDiet_barplotvert.pdf", height=5, width=24)
```
### Results: Differential Abundance by Treatment
#### *After accounting for diet, the relative abundance of 124 OTUs (asvs actually, but people are more familiar with the term OTU) belonging to 32 classified genera and 18 classified families were found to differ significantly with XPro treatment. Some notable differences in XPro-treated mice included greater abundances of 3 OTUS in genus* Bacteroides *and 2 OTUs in* Delsulfovibrio *and lesser abundances of numerous OTUs within* Ruminococcaceae *UCG-014 and* Lachnospiraceae *NK4A136 group. Other OTUs within the Lachnospiraceae family were more abundant in XPro-treated mice. Two OTUs in the Muribaculaceae family were more abundant in XPro-treated mice, while the majority were less abundant. Phylogenetic trees indicate that more closely related OTUs were more likely to be similarly regulated by XPro treatment.* Bifidobacterium *was less abundant in XPro-treated mice, but this appears to be driven by mice on control diet, as abundance of this genus in HFHF diet was low.*
### Differential Abundance by Diet
```{r deseqdataset2}
# Convert phyloseq object to deseq data set
# The last variable in the formula is the one that will be used to calculate fold changes
# Variables to the left of the last one will be taken into account when calculating effects of the primary variable
# Again, we will leave out genotype due to limited representation of both genotypes in all experimental groups and lack of significant effects of genotype found earlier in the analysis
dds <- phyloseq_to_deseq2(ps3p, ~ Treatment + Diet)
dds
```
```{r deseq22}
# Differential abundance testing
dsdds <- DESeq(dds)
# Generate results
res = results(dsdds, alpha=0.05)
# Add annotation from taxonomy table
resan <- cbind(as.data.frame(res), as.matrix(tax_table(ps3p)[rownames(res),]))
# Sort by p-value
resan <- resan[order(resan$padj),]
# Remove row names
resan2 <- resan
rownames(resan2) = NULL
# Select taxa with padj < 0.05
sigtab <- resan2[which(resan2$padj < 0.05),]
dim(sigtab)
# Export table of significant results
write.csv(sigtab, "C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByDiet/DESEQ2_TaxaSig05_Diet_accntTreatment.csv")
```
*We will visualize the changes in the taxa identified as differing significantly in abundance with HFHF diet. Each subdivision of a bar represents a different OTU (asv) within the genus or family, and the plots are ordered top to bottom according to the adjusted p value with the lowest at the top of the plot.*
```{r deseq2barplotgenus2, fig.height=6, fig.width=8}
# Reverse order the taxa according to adjusted p-value so they will be in the right order in the flipped plot
sigtab$Genus <- as.character(sigtab$Genus)
sigtab$Genus <- factor(sigtab$Genus, levels=unique(sigtab$Genus))
sigtab$Genus <- fct_rev(sigtab$Genus)
# Plot the log2 fold changes in taxa which differ significantly between the treatments.
ggplot(sigtab[!is.na(sigtab$Genus),], aes(x=Genus, y=log2FoldChange, fill=Genus)) +
geom_bar(stat="identity", color="black") +
coord_flip() +
ggtitle("Differentially Abundant Bacteria by Diet") +
theme(plot.title = element_text(hjust = 0.5)) +
theme(legend.position="none") +
geom_hline(yintercept=0, color="black", size=1)
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByDiet/DESEQ2_TaxaSig05Genus_Diet_accntTreatment_barplot.pdf", width=8, height=6)
```
*We can plot by family as well. This is often not quite as informative as plotting by genus because often, taxa within a family change in abundance in both directions by treatment. We will include it, however, because there are several taxa primarily belonging to the Muribaculaceae and Lachnospiraceae families for which a genus could not be identified, and those are left off of the image plotted by genus.*
```{r deseq2barplotfamily2, fig.height=4.6, fig.width=8}
# Reverse order the taxa according to adjusted p-value so they will be in the right order in the flipped plot
sigtab$Family <- as.character(sigtab$Family)
sigtab$Family <- factor(sigtab$Family, levels=unique(sigtab$Family))
sigtab$Family <- fct_rev(sigtab$Family)
# Plot the log2 fold changes in taxa which differ significantly between the treatments.
ggplot(sigtab[!is.na(sigtab$Family),], aes(x=Family, y=log2FoldChange, fill=Family)) +
geom_bar(stat="identity", color="black") +
coord_flip() +
ggtitle("Differentially Abundant Bacteria by Diet") +
theme(plot.title = element_text(hjust = 0.5)) +
theme(legend.position="none") +
geom_hline(yintercept=0, color="black", size=1)
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByDiet/DESEQ2_TaxaSig05Family_Diet_accntTreatment_barplot.pdf", width=8, height=4.6)
```
*Let's hone in on some of those taxa with significant changes in both directions using a phylogenetic tree. Each point beside a limb tip represents an individual subject. This will provide us with a view of the phylogenetic relatedness of particular OTUs within each family.*
```{r deseq2tree2, fig.width=10, fig.height=7}
# Let's prune our phyloseq object to the taxa identified by differential abundance testing
ps4 <- prune_taxa(taxa_names(ps3p) %in% rownames(resan[which(resan$padj < 0.05),]), ps3p)
ps4
ps4Lach <- subset_taxa(ps4, Family=="Lachnospiraceae")
plot_tree(ps4Lach, shape="Treatment", color="Diet", size="abundance", label.tips="Genus", base.spacing=0.05, title="Lachnospiraceae Family ")
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByDiet/DESEQ2_Lachnospiraceae_Dietphytree.pdf", width=8, height=7)
ps4Rum <- subset_taxa(ps4, Family=="Ruminococcaceae")
plot_tree(ps4Rum, shape="Treatment", color="Diet", size="abundance", label.tips="Genus", base.spacing=0.02, title="Ruminococcaceae Family ")
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByDiet/DESEQ2_Ruminococcaceae_Dietphytree.pdf", width=10, height=5)
ps4Mur <- subset_taxa(ps4, Family=="Muribaculaceae")
plot_tree(ps4Mur, shape="Treatment", color="Diet", size="abundance", label.tips="Genus", base.spacing=0.15, title="Muribaculaceae Family ")
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByDiet/DESEQ2_Muribaculaceae_Dietphytree.pdf", width=7, height=4)
```
*We will plot the relative abundance of individual genera identified as differing significantly with HFHF diet (with treatment accounted for). Each subdivision of the bar represents a different sample, and the height of each subdivision represents the relative abundance of the genus within that sample.*
```{r deseq2barplotv2}
# Agglomerate taxa by genus
ps4g <- tax_glom(ps4, taxrank=rank_names(ps4)[6])
# Because there are so many taxa to plot, separate them to create images of reasonable size
siggensort <- names(sort(taxa_sums(ps4g), decreasing=TRUE))
more10 <- siggensort[1:10]
less10 <- siggensort[11:20]
ps4g1 <- prune_taxa(more10, ps4g)
ps4g2 <- prune_taxa(less10, ps4g)
# Plot the relative abundance of the taxa
plot_bar(ps4g1, "Treatment", fill="Diet") +
facet_grid(~Genus) +
ggtitle("Differentially Abundant Bacteria") +
theme(plot.title = element_text(hjust = 0.5), legend.position="right")
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByDiet/DESEQ2_Genus1_Diet_accntTreatment_barplotvert.pdf", height=5, width=23)
plot_bar(ps4g2, "Treatment", fill="Diet") +
facet_grid(~Genus) +
ggtitle("Differentially Abundant Bacteria") +
theme(plot.title = element_text(hjust = 0.5), legend.position="right")
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByDiet/DESEQ2_Genus2_Diet_accntTreatment_barplotvert.pdf", height=5, width=22)
# Desulfovibrionaceae is not annotated at the genus level. Let's plot it at the family level.
ps4Des <- subset_taxa(ps4, Family=="Desulfovibrionaceae")
plot_bar(ps4Des, "Treatment", fill="Diet") +
facet_grid(~Family) +
ggtitle("Differentially Abundant Bacteria") +
theme(plot.title = element_text(hjust = 0.5), legend.position="right")
ggsave("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/Part2_Phyloseq_figs/DifferByDiet/DESEQ2_Desulfovibrionaceae_Diet_accntTreatment_barplotvert.pdf", height=5, width=3)
```
### Results: Differential Abundance by Diet
#### *After accounting for treatment, the relative abundance of 57 OTUs (asvs actually, but people are more familiar with the term OTU) belonging to 20 classified genera and 14 classified phyla were found to differ significantly with HFHF diet. Some notable differences in HFHF diet mice included reduced relative abundances of Desulfovibrionaceae (primarily in Saline-treated mice) and* Bifidobacterium *and increased abundance of* Akkermansia*. There were substantial changes in relative abundances of taxa in the Lachnospiraceae family, with some OTUs increasing and others decreasing in abundance. Many taxa in the Muribaculaceae family decreased in abundance with HFHF diet while* Muribaculum *increased. In the Ruminococcaceae family, two OTUs including* Flavonifractor *increased in abundance with HFHF diet while many others decreased. Phylogenetic trees indicate that more closely related OTUs were more likely to be similarly regulated by HFHF diet.*
```{r deseqcleanup}
rm(dds, dsdds, pf, pg, ps.top, ps.topg, ps3p, ps4, ps4Bac, ps4Des, ps4g, ps4g1, ps4g2, ps4g3, ps4g4, ps4Lach, ps4Mur, ps4Rum, res, resan, resan2, sigtab, topgtax, least11, less10, less11, more10, most1, siggensort, top, topg)
```
# Functional Predictions from 16S Sequencing Data
**We submitted an OTU abundance table and a representative sequence file (generated in part 1 of this analysis) to the online tool Piphillin which predicts genomic function based on 16S sequencing data (https://doi.org/10.1371/journal.pone.0166104). The results include a raw KEGG ortholog pathway abundance table, a raw KEGG ortholog abundance table (based on the Oct18 KEGG reference database, 99% identity threshold), and a raw rxn abundance table (based on the BioCyc 22.5 reference database, 99% identity threshold). We will evaluate these data to determine whether any features are differentially abundant between the two timepoints in this study.**
First, we need to get our data tables into the appropriate format for analysis with DESeq2 (built for differential expression analysis based on negative binomial distribution).
```{r pitblfrmt}
ctspko <- read.delim("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/PiphillinResults/20200810201312__Madelyn_Houser_keggPiphillin/ko_pathway_abund_table_unnorm.txt", row.names="Pathway")
ctsko <- read.delim("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/PiphillinResults/20200810201312__Madelyn_Houser_keggPiphillin/ko_abund_table_unnorm.txt", row.names="feature")
ctsrxn <- read.delim("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/PiphillinResults/20200810201312__Madelyn_Houser_biocycPiphillin/rxn_abund_table_unnorm.txt", row.names="feature")
# Make sample names consistent with those in the phyloseq sample data document
names(ctspko) <- names(ctspko) %>% str_replace("X", "") %>% str_replace_all("\\.", "-")
names(ctsko) <- names(ctsko) %>% str_replace("X", "") %>% str_replace_all("\\.", "-")
names(ctsrxn) <- names(ctsrxn) %>% str_replace("X", "") %>% str_replace_all("\\.", "-")
# Remove negative and positive controls from tables
# Remove poor quality sample from tables
which(names(ctspko)=="Positive-1")
ctspko <- ctspko[,-c(33:36)] %>% select(-(contains("359-1-25-18")))
ctsko <- ctsko[,-c(33:36)] %>% select(-(contains("359-1-25-18")))
ctsrxn <- ctsrxn[,-c(33:36)] %>% select(-(contains("359-1-25-18")))
# Round number to nearest integer because DESeq2 takes count data as input
ctspko <- round(ctspko, digits=0)
ctsko <- round(ctsko, digits=0)
ctsrxn <- round(ctsrxn, digits=0)
# Convert to matrix
ctspko <- as.matrix(ctspko)
ctsko <- as.matrix(ctsko)
ctsrxn <- as.matrix(ctsrxn)
```
We also need to create a metadata table with information about the samples. The sample names must be in the same order as they are in the counts matrix.
```{r picoldata}
coldata <- data.frame(sample_data(ps3)[,1:4])
coldata$Treatment <- factor(coldata$Treatment, levels=c("Saline", "XPro"))
coldata$Diet <- factor(coldata$Diet, levels=c("CD", "HFHS"))
all(rownames(coldata) == colnames(ctspko))
all(rownames(coldata) == colnames(ctsko))
all(rownames(coldata) == colnames(ctsrxn))
```
## Differential Functional Predictions by Treatment
Now we create DESeqDataSets to compare treatments while taking diet into account.
```{r pideseqdataset}
ddspko <- DESeqDataSetFromMatrix(countData = ctspko,
colData = coldata,
design = ~ Diet + Treatment)
ddsko <- DESeqDataSetFromMatrix(countData = ctsko,
colData = coldata,
design = ~ Diet + Treatment)
ddsrxn <- DESeqDataSetFromMatrix(countData = ctsrxn,
colData = coldata,
design = ~ Diet + Treatment)
ddspko
ddsko
ddsrxn
```
We will do some minor pre-filtering to exclude features/pathways that were only identified in a very few samples and at very low counts.
```{r deseq2prefilter}
ddspko <- ddspko[rowSums(counts(ddspko)) >= 10,]
ddsko <- ddsko[rowSums(counts(ddsko)) >= 10,]
ddsrxn <- ddsrxn[rowSums(counts(ddsrxn)) >= 10,]
ddspko
ddsko
ddsrxn
```
Now we can identify differentially abundant features/pathways between the treatments.
```{r pideseq2}
# differential abundance testing
ddspko <- DESeq(ddspko)
ddsko <- DESeq(ddsko)
ddsrxn <- DESeq(ddsrxn)
# Results generation
respko <- results(ddspko, alpha=0.05)
resko <- results(ddsko, alpha=0.05)
resrxn <- results(ddsrxn, alpha=0.05)
respko
resko
resrxn
# Sort by p-value
respko <- respko[order(respko$pvalue),]
resko <- resko[order(resko$pvalue),]
resrxn <- resrxn[order(resrxn$pvalue),]
# Summarize results
summary(respko)
summary(resko)
summary(resrxn)
```
#### *Significant (adjusted p-value < 0.05) increases in 26 KEGG pathways and decreases in 31 KEGG pathways were predicted with XPro treatment. Significant increases in 186 KEGG orthologs and a significant decrease in 534 KEGG orthologs were predicted with XPro treatment. A significant increase in 327 reactions from the BioCyc database and a significant decrease in 204 reactions from the BioCyc database was predicted with XPro treatment.*
We will export the list of features that differ between the treatments.
```{r pideseq2sigdif}
respkoSig <- subset(respko, padj < 0.05)
reskoSig <- subset(resko, padj < 0.05)
resrxnSig <- subset(resrxn, padj < 0.05)
write.csv(as.data.frame(respkoSig), "C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/PiphillinResults/20200810201312__Madelyn_Houser_keggPiphillin/DifferByTreatment/PKO_Sig05.csv")
write.csv(as.data.frame(reskoSig), "C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/PiphillinResults/20200810201312__Madelyn_Houser_keggPiphillin/DifferByTreatment/KO_Sig05.csv")
write.csv(as.data.frame(resrxnSig), "C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/PiphillinResults/20200810201312__Madelyn_Houser_biocycPiphillin/DifferByTreatment/RXN_Sig05.csv")
# KEGG pathways in exported document were identified using the KEGG PATHWAY database (https://www.genome.jp/kegg/pathway.html).
# KEGG orthologs in exported document were identified using the KEGG ORTHOLOGY database (https://www.genome.jp/kegg/ko.html).
## KOs were copied from the site into a column of the spreadsheet, KO numbers removed, and [] around the phrase "acyl-carrier-protein" and "NiFe" were replaced with parentheses.
# Detailed information on BioCyc reactions can be found in the BioCyc Database Collection (https://biocyc.org/).
# Import significant KO list with KOs annotated.
reskoSig1 <- read.csv("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/PiphillinResults/20200810201312__Madelyn_Houser_keggPiphillin/DifferByTreatment/KO_Sig05_an.csv", row.names=1)
# Improve formatting of annotation for KO list
reskoSig1$X.1 <- as.character(reskoSig1$X.1)
reskoSig1$Name <- sapply(strsplit(reskoSig1$X.1, ";"), "[", 1)
reskoSig1$Definition <- sapply(strsplit(reskoSig1$X.1, "; "), "[", 2)
reskoSig1$Enzyme_Class <- sapply(strsplit(reskoSig1$Definition, "\\["), "[", 2)
reskoSig1$Definition <- sapply(strsplit(reskoSig1$Definition, "\\["), "[", 1)
reskoSig1$Enzyme_Class <- reskoSig1$Enzyme_Class %>% str_replace("EC:", "") %>% str_replace("\\]", "")
reskoSig1 <- reskoSig1 %>% select(-X.1)
write.csv(as.data.frame(reskoSig1), "C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/PiphillinResults/20200810201312__Madelyn_Houser_keggPiphillin/DifferByTreatment/KO_Sig05_Annotated.csv")
```
We can visualize the differentially abundant features.
```{r piMAplots, fig.width=5}
# Plot the log2 fold changes attributable to a given variable over the mean of counts for all the samples
# Red color indicates adjusted p-value < 0.05
# Points which fall out of the window are plotted as open triangles pointing either up or down.
plotMA(respko, alpha=0.05, ylim=c(-3,4), xlab="Mean Counts", main="Differentially Abundant KEGG Pathways by Treatment")
plotMA(resko, alpha=0.05, ylim=c(-10,8), xlab="Mean Counts", main="Differentially Abundant KEGG Orthologs by Treatment")
plotMA(resrxn, alpha=0.05, ylim=c(-8,8), xlab="Mean Counts", main="Differentially Abundant BioCyc Reactions by Treatment")
```
## Differential Functional Predictions by Diet
Now we create DESeqDataSets to compare diets while taking treatment into account.
```{r pideseqdataset2}
ddspko <- DESeqDataSetFromMatrix(countData = ctspko,
colData = coldata,
design = ~ Treatment + Diet)
ddsko <- DESeqDataSetFromMatrix(countData = ctsko,
colData = coldata,
design = ~ Treatment + Diet)
ddsrxn <- DESeqDataSetFromMatrix(countData = ctsrxn,
colData = coldata,
design = ~ Treatment + Diet)
ddspko
ddsko
ddsrxn
```
We will do some minor pre-filtering to exclude features/pathways that were only identified in a very few samples and at very low counts.
```{r deseq2prefilter2}
ddspko <- ddspko[rowSums(counts(ddspko)) >= 10,]
ddsko <- ddsko[rowSums(counts(ddsko)) >= 10,]
ddsrxn <- ddsrxn[rowSums(counts(ddsrxn)) >= 10,]
ddspko
ddsko
ddsrxn
```
Now we can identify differentially abundant features/pathways between the diets.
```{r pideseq22}
# differential abundance testing
ddspko <- DESeq(ddspko)
ddsko <- DESeq(ddsko)
ddsrxn <- DESeq(ddsrxn)
# Results generation
respko <- results(ddspko, alpha=0.05)
resko <- results(ddsko, alpha=0.05)
resrxn <- results(ddsrxn, alpha=0.05)
respko
resko
resrxn
# Sort by p-value
respko <- respko[order(respko$pvalue),]
resko <- resko[order(resko$pvalue),]
resrxn <- resrxn[order(resrxn$pvalue),]
# Summarize results
summary(respko)
summary(resko)
summary(resrxn)
```
#### *Significant (adjusted p-value < 0.05) increases in 16 KEGG pathways and decreases in 12 KEGG pathways were predicted with HFHF diet. Significant increases in 79 KEGG orthologs and a significant decrease in 146 KEGG orthologs were predicted with HFHF diet. A significant increase in 157 reactions from the BioCyc database and a significant decrease in 169 reactions from the BioCyc database was predicted with HFHF diet.*
We will export the list of features that differ between the treatments.
```{r pideseq2sigdif2}
respkoSig <- subset(respko, padj < 0.05)
reskoSig <- subset(resko, padj < 0.05)
resrxnSig <- subset(resrxn, padj < 0.05)
write.csv(as.data.frame(respkoSig), "C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/PiphillinResults/20200810201312__Madelyn_Houser_keggPiphillin/DifferByDiet/PKO_Sig05.csv")
write.csv(as.data.frame(reskoSig), "C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/PiphillinResults/20200810201312__Madelyn_Houser_keggPiphillin/DifferByDiet/KO_Sig05.csv")
write.csv(as.data.frame(resrxnSig), "C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/PiphillinResults/20200810201312__Madelyn_Houser_biocycPiphillin/DifferByDiet/RXN_Sig05.csv")
# KEGG pathways in exported document were identified using the KEGG PATHWAY database (https://www.genome.jp/kegg/pathway.html).
# KEGG orthologs in exported document were identified using the KEGG ORTHOLOGY database (https://www.genome.jp/kegg/ko.html).
## KOs were copied from the site into a column of the spreadsheet, KO numbers removed, and [] around the phrases "acyl-carrier protein", "protein-PII", and "glutamine synthetase" were replaced with parentheses.
# Detailed information on BioCyc reactions can be found in the BioCyc Database Collection (https://biocyc.org/).
# Import significant KO list with KOs annotated.
reskoSig1 <- read.csv("C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/PiphillinResults/20200810201312__Madelyn_Houser_keggPiphillin/DifferByDiet/KO_Sig05_an.csv", row.names=1)
# Improve formatting of annotation for KO list
reskoSig1$X.1 <- as.character(reskoSig1$X.1)
reskoSig1$Name <- sapply(strsplit(reskoSig1$X.1, ";"), "[", 1)
reskoSig1$Definition <- sapply(strsplit(reskoSig1$X.1, "; "), "[", 2)
reskoSig1$Enzyme_Class <- sapply(strsplit(reskoSig1$Definition, "\\["), "[", 2)
reskoSig1$Definition <- sapply(strsplit(reskoSig1$Definition, "\\["), "[", 1)
reskoSig1$Enzyme_Class <- reskoSig1$Enzyme_Class %>% str_replace("EC:", "") %>% str_replace("\\]", "")
reskoSig1 <- reskoSig1 %>% select(-X.1)
write.csv(as.data.frame(reskoSig1), "C:/Users/mecho/Documents/TANSEY_LAB/RAnalyses/Mary/5xFAD microbiome/Analysis/MoreTrim/PiphillinResults/20200810201312__Madelyn_Houser_keggPiphillin/DifferByDiet/KO_Sig05_Annotated.csv")
```
We can visualize the differentially abundant features.
```{r piMAplots2, fig.width=5}