Skip to content

Commit

Permalink
Dynamically make list of flu clades to plot
Browse files Browse the repository at this point in the history
Instead of hardcoding a list of flu clades to plot in early and late
data, build the list of clades dynamically from the input table. For the
later flu data, only assign colors to clades with at least 10 samples,
so we can more easily distinguish these larger groups and not waste the
few colors we have on smaller groups.
  • Loading branch information
huddlej committed Dec 20, 2023
1 parent c40e075 commit 74fd863
Show file tree
Hide file tree
Showing 11 changed files with 162 additions and 106 deletions.

Large diffs are not rendered by default.

Binary file modified manuscript/figures/flu-2016-2018-ha-embeddings-by-clade.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion manuscript/figures/flu-2016-2018-mds-by-clade.html

Large diffs are not rendered by default.

Binary file modified manuscript/figures/flu-2016-2018-mds-by-clade.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

Binary file modified manuscript/figures/flu-2018-2020-ha-embeddings-by-clade.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion manuscript/figures/flu-2018-2020-mds-by-clade.html

Large diffs are not rendered by default.

Binary file modified manuscript/figures/flu-2018-2020-mds-by-clade.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
140 changes: 97 additions & 43 deletions seasonal-flu-nextstrain-2018-2020/2021-03-09Notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -67,140 +67,164 @@
"static_mds_chart = snakemake.output.MDS_Supplement_PNG"
]
},
{
"cell_type": "markdown",
"id": "dc772cba",
"metadata": {},
"source": [
"## Load data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5e135bbc",
"id": "06532877",
"metadata": {},
"outputs": [],
"source": [
"clades_to_plot = [\n",
" '3c2',\n",
" '3c3.A',\n",
" 'A1',\n",
" 'A1b/131K',\n",
" 'A1b/135K',\n",
" 'A1b/135N',\n",
" 'A1b/137F',\n",
" 'A1b/186D',\n",
" 'A1b/197R',\n",
" 'A1b/94N',\n",
" 'A2',\n",
" 'A3',\n",
"]\n",
"domain = clades_to_plot"
"embeddings_df = pd.read_csv(embeddings_path, sep=\"\\t\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "51de9a8a",
"id": "bc994d18",
"metadata": {},
"outputs": [],
"source": [
"len(clades_to_plot)"
"embeddings_df = embeddings_df.rename(\n",
" columns={\n",
" \"num_date\": \"date\",\n",
" }\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "53923a14",
"metadata": {},
"outputs": [],
"source": [
"clade_counts = embeddings_df[\"clade_membership\"].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2b3a1e7b",
"metadata": {},
"outputs": [],
"source": [
"clade_counts"
]
},
{
"cell_type": "markdown",
"id": "dc772cba",
"id": "f10ad3f9",
"metadata": {},
"source": [
"## Load data"
"Only assign colors to clades with at least 10 samples. This approach allows us to clearly see larger clades using fewer colors."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3b0b80a2",
"id": "5e4c5882",
"metadata": {},
"outputs": [],
"source": [
"colors = pd.read_csv(colors_path, sep=\"\\t\", names=[i for i in range(0,101)], nrows=101)"
"clades_to_plot_with_color = sorted(clade_counts[clade_counts >= 10].index.values)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "28c2742c",
"id": "e4570623",
"metadata": {},
"outputs": [],
"source": [
"clade_color_range = colors.iloc[len(clades_to_plot) - 1].dropna().tolist()"
"clades_to_plot = sorted(embeddings_df[\"clade_membership\"].drop_duplicates().values)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "57ada75a",
"id": "de1a59a2",
"metadata": {},
"outputs": [],
"source": [
"len(clade_color_range)"
"clades_to_plot"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "40527d62",
"id": "5e135bbc",
"metadata": {},
"outputs": [],
"source": [
"#domain.append(\"other\")"
"domain = clades_to_plot_with_color"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "75fb5a9d",
"id": "51de9a8a",
"metadata": {},
"outputs": [],
"source": [
"len(clades_to_plot)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3b0b80a2",
"metadata": {},
"outputs": [],
"source": [
"#clade_color_range.append(\"#999999\")"
"colors = pd.read_csv(colors_path, sep=\"\\t\", names=[i for i in range(0,101)], nrows=101)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "06532877",
"id": "28c2742c",
"metadata": {},
"outputs": [],
"source": [
"embeddings_df = pd.read_csv(embeddings_path, sep=\"\\t\")"
"clade_color_range = colors.iloc[len(clades_to_plot_with_color) - 1].dropna().tolist()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bc994d18",
"id": "57ada75a",
"metadata": {},
"outputs": [],
"source": [
"embeddings_df = embeddings_df.rename(\n",
" columns={\n",
" \"num_date\": \"date\",\n",
" }\n",
")"
"len(clade_color_range)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "53923a14",
"id": "40527d62",
"metadata": {},
"outputs": [],
"source": [
"embeddings_df[\"clade_membership\"].value_counts()"
"domain.append(\"other\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e4570623",
"id": "75fb5a9d",
"metadata": {},
"outputs": [],
"source": [
"embeddings_df[\"clade_membership\"].drop_duplicates().sort_values().tolist()"
"clade_color_range.append(\"#999999\")"
]
},
{
Expand All @@ -211,7 +235,7 @@
"outputs": [],
"source": [
"embeddings_df[\"clade_membership_color\"] = embeddings_df[\"clade_membership\"].apply(\n",
" lambda clade: clade if clade in clades_to_plot else \"other\"\n",
" lambda clade: clade if clade in clades_to_plot_with_color else \"other\"\n",
")"
]
},
Expand All @@ -225,6 +249,36 @@
"embeddings_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dfafbe8e",
"metadata": {},
"outputs": [],
"source": [
"embeddings_df[\"clade_membership_color\"].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b6652ac0",
"metadata": {},
"outputs": [],
"source": [
"domain"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eba1af54",
"metadata": {},
"outputs": [],
"source": [
"clade_color_range"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down
Loading

0 comments on commit 74fd863

Please sign in to comment.