Skip to content

Commit

Permalink
Finish updating supp figs in README
Browse files Browse the repository at this point in the history
  • Loading branch information
huddlej committed Aug 28, 2024
1 parent 7a42428 commit 559c690
Showing 1 changed file with 34 additions and 11 deletions.
45 changes: 34 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,17 +114,40 @@ Line colors represent the clade membership of the most ancestral node in the pai
Line thickness in the embeddings scales by the square root of the number of leaves descending from a given node in the phylogeny.
Clade labels appear in the tree at the earliest ancestral node of the tree for each clade.
Clade labels appear in each embedding at the average position on the x and y axis for sequences in a given clade.
- [S7 Fig. **MDS embeddings for late (2018–2020) influenza H3N2 HA sequences showing all three components.**](https://blab.github.io/cartography/flu-2018-2020-mds-by-clade.html) Line segments in each embedding reflect phylogenetic relationships with internal node positions calculated from the mean positions of their immediate descendants in each dimension (see Methods). Line colors represent the clade membership of the most ancestral node in the pair of nodes connected by the segment. Line thickness scales by the square root of the number of leaves descending from a given node in the phylogeny.
- [S9 Fig. **Embeddings influenza H3N2 HA-only (left) and combined HA/NA (right) showing the effects of additional NA genetic information on the
placement of reassortment events detected by TreeKnit (MCCs).**](https://blab.github.io/cartography/flu-2016-2018-ha-na-all-embeddings-by-mcc.html)
- [S10 Fig. **PCA embeddings for influenza H3N2 HA sequences only (top row) and HA/NA sequences combined (bottom row) showing the HA trees colored by clusters identified in each embedding (left) and the corresponding embeddings colored by cluster (right).**](https://blab.github.io/cartography/flu-2016-2018-ha-na-pca-by-cluster.html)
- [S11 Fig. **MDS embeddings for influenza H3N2 HA sequences only (top row) and HA/NA sequences combined (bottom row) showing the HA trees colored by clusters identified in each embedding (left) and the corresponding embeddings colored by cluster (right).**](https://blab.github.io/cartography/flu-2016-2018-ha-na-mds-by-cluster.html)
- [S12 Fig. **t-SNE embeddings for influenza H3N2 HA sequences only (top row) and HA/NA sequences combined (bottom row) showing the HA trees colored by clusters identified in each embedding (left) and the corresponding embeddings colored by cluster (right).**](https://blab.github.io/cartography/flu-2016-2018-ha-na-tsne-by-cluster.html)
- [S13 Fig. **UMAP embeddings for influenza H3N2 HA sequences only (top row) and HA/NA sequences combined (bottom row) showing the HA trees colored by clusters identified in each embedding (left) and the corresponding embeddings colored by cluster (right).**](https://blab.github.io/cartography/flu-2016-2018-ha-na-umap-by-cluster.html)
- [S14 Fig. **MDS embeddings for early SARS-CoV-2 sequences showing all three components.**](https://blab.github.io/cartography/sarscov2-mds-by-Nextstrain_clade-clade.html) Line segments in each embedding reflect phylogenetic relationships with internal node positions calculated from the mean positions of their immediate descendants in each dimension (see Methods). Line thickness scales by the square root of the number of leaves descending from a given node in the phylogeny.
- [S15 Fig. **Phylogeny of early (2020–2022) SARS-CoV-2 sequences plotted by number of nucleotide substitutions from the most recent common ancestor on the x-axis (top) and low-dimensional embeddings of the same sequences by PCA (middle left), MDS (middle right), t-SNE (bottom left), and UMAP (bottom right).**](https://blab.github.io/cartography/sarscov2-embeddings-by-Nextclade_pango_collapsed-clade.html) Tips in the tree and embeddings are colored by their collapsed Nextclade pango lineage assignment. Line segments in each embedding reflect phylogenetic relationships with internal node positions calculated from the mean positions of their immediate descendants in each dimension (see Methods). Line thickness scales by the square root of the number of leaves descending from a given node in the phylogeny.
- [S17 Fig. **Phylogenetic trees (left) and embeddings (right) of early (2020–2022) SARS-CoV-2 sequences colored by HDBSCAN cluster.**](https://blab.github.io/cartography/sarscov2-embeddings-by-cluster-vs-Nextclade_pango_collapsed.html) Normalized VI values per embedding reflect the distance between clusters and known genetic groups (collapsed Nextclade pango lineages). Line segments in each embedding reflect phylogenetic relationships with internal node positions calculated from the mean positions of their immediate descendants in each dimension (see Methods). Line thickness scales by the square root of the number of leaves descending from a given node in the phylogeny.
- [S19 Fig. **Phylogenetic trees (left) and embeddings (right) of late (2022–2023) SARS-CoV-2 sequences colored by HDBSCAN cluster.**](https://blab.github.io/cartography/sarscov2-test-embeddings-by-cluster-vs-Nextclade_pango_collapsed.html) Normalized VI values per embedding reflect the distance between clusters and known genetic groups (collapsed Nextclade pango lineages).
- [S8 Fig. **MDS embeddings for late (2018--2020) influenza H3N2 HA sequences showing all three components.**](https://blab.github.io/cartography/flu-2018-2020-mds-by-clade.html) Line segments in each embedding reflect phylogenetic relationships with internal node positions calculated from the mean positions of their immediate descendants in each dimension (see Methods).
Line colors represent the clade membership of the most ancestral node in the pair of nodes connected by the segment.
Line thickness in the embeddings scales by the square root of the number of leaves descending from a given node in the phylogeny.
Clade labels appear in the tree at the earliest ancestral node of the tree for each clade.
Clade labels appear in each embedding at the average position on the x and y axis for sequences in a given clade.
- [S11 Fig. **Embeddings influenza H3N2 HA-only (left) and combined HA/NA (right) showing the effects of additional NA genetic information on the placement of reassortment events detected by TreeKnit (MCCs).**](https://blab.github.io/cartography/flu-2016-2018-ha-na-all-embeddings-by-mcc.html) Sequences from MCCs with fewer than 10 sequences are colored as "unassigned".
Normalized VI values quantify the degree to which the combination of HA and NA sequences in an embedding reduces the distance of embedding clusters to TreeKnit reassortment groups represented by MCCs.
MCC labels for larger pairs of reassortment events appear in each embedding at the average position on the x and y axis for sequences in a given MCC.
MCCs 14 and 11 represent a previously published reassortment event within Nextstrain clade A2 ([Potter et al. 2019](https://doi.org/10.1093/ve/vez046)).
Labels for MCC 14 represents the sequences from clade A2.
- [S12 Fig. **PCA embeddings for influenza H3N2 HA sequences only (top row) and HA/NA sequences combined (bottom row) showing the HA trees colored by clusters identified in each embedding (left) and the corresponding embeddings colored by cluster (right).**](https://blab.github.io/cartography/flu-2016-2018-ha-na-pca-by-cluster.html) Normalized VI values quantify the degree to which the combination of HA and NA sequences in an embedding reduces the distance of embedding clusters to TreeKnit reassortment groups represented by MCCs.
- [S13 Fig. **MDS embeddings for influenza H3N2 HA sequences only (top row) and HA/NA sequences combined (bottom row) showing the HA trees colored by clusters identified in each embedding (left) and the corresponding embeddings colored by cluster (right).**](https://blab.github.io/cartography/flu-2016-2018-ha-na-mds-by-cluster.html) Normalized VI values quantify the degree to which the combination of HA and NA sequences in an embedding reduces the distance of embedding clusters to TreeKnit reassortment groups represented by MCCs.
- [S14 Fig. **t-SNE embeddings for influenza H3N2 HA sequences only (top row) and HA/NA sequences combined (bottom row) showing the HA trees colored by clusters identified in each embedding (left) and the corresponding embeddings colored by cluster (right).**](https://blab.github.io/cartography/flu-2016-2018-ha-na-tsne-by-cluster.html) Normalized VI values quantify the degree to which the combination of HA and NA sequences in an embedding reduces the distance of embedding clusters to TreeKnit reassortment groups represented by MCCs.
- [S16 Fig. **UMAP embeddings for influenza H3N2 HA sequences only (top row) and HA/NA sequences combined (bottom row) showing the HA trees colored by clusters identified in each embedding (left) and the corresponding embeddings colored by cluster (right).**](https://blab.github.io/cartography/flu-2016-2018-ha-na-umap-by-cluster.html) Normalized VI values quantify the degree to which the combination of HA and NA sequences in an embedding reduces the distance of embedding clusters to TreeKnit reassortment groups represented by MCCs.
- [S17 Fig. **MDS embeddings for early SARS-CoV-2 sequences showing all three components.**](https://blab.github.io/cartography/sarscov2-mds-by-Nextstrain_clade-clade.html) Line segments in each embedding reflect phylogenetic relationships with internal node positions calculated from the mean positions of their immediate descendants in each dimension (see Methods).
Line thickness in the embeddings scales by the square root of the number of leaves descending from a given node in the phylogeny.
Clade labels in the tree and embeddings highlight larger clades.
- [S18 Fig. **Phylogeny of early (2020--2022) SARS-CoV-2 sequences plotted by number of nucleotide substitutions from the most recent common ancestor on the x-axis (top) and low-dimensional embeddings of the same sequences by PCA (middle left), MDS (middle right), t-SNE (bottom left), and UMAP (bottom right).**](https://blab.github.io/cartography/sarscov2-embeddings-by-Nextclade_pango_collapsed-clade.html) Tips in the tree and embeddings are colored by their Pango lineage assignment.
Line segments in each embedding reflect phylogenetic relationships with internal node positions calculated from the mean positions of their immediate descendants in each dimension (see Methods).
Line thickness in the embeddings scales by the square root of the number of leaves descending from a given node in the phylogeny.
Clade labels in the tree and embeddings highlight larger Pango lineages.
- [S20 Fig. **Phylogenetic trees (left) and embeddings (right) of early (2020--2022) SARS-CoV-2 sequences colored by HDBSCAN cluster.**](https://blab.github.io/cartography/sarscov2-embeddings-by-cluster-vs-Nextclade_pango_collapsed.html) Normalized VI values per embedding reflect the distance between clusters and known genetic groups (Pango lineages).
Line segments in each embedding reflect phylogenetic relationships with internal node positions calculated from the mean positions of their immediate descendants in each dimension (see Methods).
- [S21 Fig. **Phylogeny of late (2022--2023) SARS-CoV-2 sequences plotted by number of nucleotide substitutions from the most recent common ancestor on the x-axis (top) and low-dimensional embeddings of the same sequences by PCA (middle left), MDS (middle right), t-SNE (bottom left), and UMAP (bottom right).**](https://blab.github.io/cartography/sarscov2-test-embeddings-by-Nextstrain_clade-clade.html) Tips in the tree and embeddings are colored by their Nextstrain clade assignment.
Tips that could not be assigned to a predefined Nextstrain clade due to recombination were colored as "recombinant".
Line segments in each embedding reflect phylogenetic relationships with internal node positions calculated from the mean positions of their immediate descendants in each dimension (see Methods).
Line thickness in the embeddings scales by the square root of the number of leaves descending from a given node in the phylogeny.
Clade labels in the tree and embeddings highlight larger clades.
Line thickness in the embeddings scales by the square root of the number of leaves descending from a given node in the phylogeny.
- [S22 Fig. **Phylogeny of late (2022--2023) SARS-CoV-2 sequences plotted by number of nucleotide substitutions from the most recent common ancestor on the x-axis (top) and low-dimensional embeddings of the same sequences by PCA (middle left), MDS (middle right), t-SNE (bottom left), and UMAP (bottom right).**](https://blab.github.io/cartography/sarscov2-test-embeddings-by-Nextclade_pango_collapsed-clade.html) Tips in the tree and embeddings are colored by their Pango lineage assignment.
Line segments in each embedding reflect phylogenetic relationships with internal node positions calculated from the mean positions of their immediate descendants in each dimension (see Methods).
Line thickness in the embeddings scales by the square root of the number of leaves descending from a given node in the phylogeny.
Clade labels in the tree and embeddings highlight larger Pango lineages.
- [S24 Fig. **Phylogenetic trees (left) and embeddings (right) of late (2022--2023) SARS-CoV-2 sequences colored by HDBSCAN cluster.**](https://blab.github.io/cartography/sarscov2-test-embeddings-by-cluster-vs-Nextclade_pango_collapsed.html) Normalized VI values per embedding reflect the distance between clusters and known genetic groups (Pango lineages).

## Supplemental tables

Expand Down

0 comments on commit 559c690

Please sign in to comment.