diff --git a/docs/listings.json b/docs/listings.json index ee7aebe..449bc9a 100644 --- a/docs/listings.json +++ b/docs/listings.json @@ -14,7 +14,8 @@ "/tutorials/math_stat_1.html", "/tutorials/math_stat_2.html", "/tutorials/math_stat_3.html", - "/tutorials/tidytuesday_05212024.html" + "/tutorials/tidytuesday_05212024.html", + "/tutorials/tidytuesday_06112024.html" ] } ] \ No newline at end of file diff --git a/docs/projects.html b/docs/projects.html index c662570..df3f20b 100644 --- a/docs/projects.html +++ b/docs/projects.html @@ -267,7 +267,7 @@
-
+

diff --git a/docs/projects.xml b/docs/projects.xml index 5c81477..4a66af7 100644 --- a/docs/projects.xml +++ b/docs/projects.xml @@ -10,7 +10,7 @@ quarto-1.4.554 -Tue, 11 Jun 2024 20:30:15 GMT +Tue, 11 Jun 2024 22:11:00 GMT Our World in Emissions | TidyTuesday Mitch Harrison @@ -604,7 +604,7 @@ font-style: inherit;">fill = BG_COLOR) Data Viz TidyTuesday https://mitchellharrison.github.io/projects/tidytuesday_05212024/emissions.html - Tue, 11 Jun 2024 20:30:15 GMT + Tue, 11 Jun 2024 22:11:00 GMT @@ -707,7 +707,7 @@ $icon TidyTuesday Dashboard https://mitchellharrison.github.io/projects/tidytuesday_06042024/cheese.html - Tue, 11 Jun 2024 20:30:15 GMT + Tue, 11 Jun 2024 22:11:00 GMT @@ -720,7 +720,7 @@ $icon

Welcome!

-

Happy pride month! On this fine TidyTuesday afternoon, we will see how different types of colleges and universities handle LGBTQ+ inclusion! The Campus Pride Index tracks safety, inclusivity, and LGBTQ+ policies/programs at universities across the United States. Results are on a 1-5 scale (with higher numbers being most inclusive), and colleges are grouped by various discrete categories. Today, we’ll build a stacked horizontal bar chart to see the distribution of scores for some of those categories. I’ll use the ggchicklet package and some custom fonts for easy aesthetic changes, and we’ll be done! If you want to see a step-by-step tutorial explaining the code, click here.

+

Happy pride month! On this fine TidyTuesday afternoon, we will see how different types of colleges and universities handle LGBTQ+ inclusion! The Campus Pride Index tracks safety, inclusivity, and LGBTQ+ policies/programs at universities across the United States. Results are on a 1-5 scale (with higher numbers being most inclusive), and colleges are grouped by various discrete categories. Today, we’ll build a stacked horizontal bar chart to see the distribution of scores for some of those categories. I’ll use the ggchicklet package and some custom fonts for easy aesthetic changes, and we’ll be done! If you want to see a step-by-step tutorial explaining the code, click here.

Click here for code @@ -1236,7 +1236,7 @@ font-style: inherit;">20)) Data Viz TidyTuesday https://mitchellharrison.github.io/projects/tidytuesday_06112024/pride.html - Tue, 11 Jun 2024 20:30:15 GMT + Tue, 11 Jun 2024 22:11:00 GMT diff --git a/docs/projects/tidytuesday_06112024/pride.html b/docs/projects/tidytuesday_06112024/pride.html index e10a6e1..8b60c79 100644 --- a/docs/projects/tidytuesday_06112024/pride.html +++ b/docs/projects/tidytuesday_06112024/pride.html @@ -184,7 +184,7 @@

Campus Pride Index | TidyTuesday

Welcome!

-

Happy pride month! On this fine TidyTuesday afternoon, we will see how different types of colleges and universities handle LGBTQ+ inclusion! The Campus Pride Index tracks safety, inclusivity, and LGBTQ+ policies/programs at universities across the United States. Results are on a 1-5 scale (with higher numbers being most inclusive), and colleges are grouped by various discrete categories. Today, we’ll build a stacked horizontal bar chart to see the distribution of scores for some of those categories. I’ll use the ggchicklet package and some custom fonts for easy aesthetic changes, and we’ll be done! If you want to see a step-by-step tutorial explaining the code, click here.

+

Happy pride month! On this fine TidyTuesday afternoon, we will see how different types of colleges and universities handle LGBTQ+ inclusion! The Campus Pride Index tracks safety, inclusivity, and LGBTQ+ policies/programs at universities across the United States. Results are on a 1-5 scale (with higher numbers being most inclusive), and colleges are grouped by various discrete categories. Today, we’ll build a stacked horizontal bar chart to see the distribution of scores for some of those categories. I’ll use the ggchicklet package and some custom fonts for easy aesthetic changes, and we’ll be done! If you want to see a step-by-step tutorial explaining the code, click here.

Click here for code diff --git a/docs/search.json b/docs/search.json index 20ec881..3333903 100644 --- a/docs/search.json +++ b/docs/search.json @@ -18,7 +18,7 @@ "href": "tutorials/tidytuesday_05212024.html", "title": "Our World in Emissions | TidyTutorial", "section": "", - "text": "Welcome! If you saw my post for this week’s TidyTuesday, I’m glad you liked it enough to learn from it! If not, you can either scroll to the bottom to see the final product or click here to see it. For this plot, we will use an area plot to visualize the global emissions by type going back to 1900. To start, we will use a bare-bones ggplot2 area chart with no bells or whistles to see what we are working with.\n\n\nClick here for code\nlibrary(tidyverse)\n\n# read data and rename an ugly column ------------------------------------------\nemis<- read_csv(paste0(\n \"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/\",\n \"data/2024/2024-05-21/emissions.csv\"\n )\n)\n\nemis <- emis |>\n rename(emissions = \"total_emissions_MtCO2e\")\n\nemis |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9)\n\n\n\n\n\n\n\n\n\nOkay, we’ve learned a lot. First, there are a lot of categories. A good rule of thumb is that once you get to about seven colors, even non-colorblind humans struggle to differentiate. But there is hope! Notice that there are several types of coal production. Let’s aggregate them. Second, there is a long tail on the left because of near-zero data. Let’s bring our limit to the right to get a better look.\n\n\nClick here for code\nemis |>\n filter(year >= 1900) |> # get rid of that tail\n mutate(\n # aggregate coal\n commodity = if_else(str_detect(commodity, \"Coal\"), \"Coal\", commodity),\n ) |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9)\n\n\n\n\n\n\n\n\n\nMuch better! But to me, having the smallest category (cement) on top feels awkward. Let’s reorder the categories! I’ll do so in descending order of emissions in the last year.\n\n\nClick here for code\nLEVS <- c(\"Coal\", \"Oil & NGL\", \"Natural Gas\", \"Cement\") # our desired order\n\nemis |>\n filter(year >= 1900) |>\n mutate(\n commodity = if_else(str_detect(commodity, \"Coal\"), \"Coal\", commodity),\n commodity = factor(commodity, levels = LEVS) # re-order \n ) |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9)\n\n\n\n\n\n\n\n\n\nNow we’re cooking! It’s time for some style points. I’ll use my favorite aesthetic cheat code: ggthemes. Let’s add a theme and color scheme. I’m going with the FiveThirtyEight theme and a colorblind-friendly palette. I’ll also take this opportunity to adjust the opacity down just a touch. This is a personal choice, but I find it nice to be able to see the grid behind such ink-heavy plots as area plots.\n\n\n\n\n\n\nImportant\n\n\n\nRemember: unless you are making plots for a very small number of people and you know for certain that none are colorblind, making inaccessible plots is inexcusable. Of course, we all make mistakes, so if you ever notice an accessibility issue on my site, reach out and let me know on Discord or via a GitHub issue so I can improve for next time!\n\n\n\n\nClick here for code\nLEVS <- c(\"Coal\", \"Oil & NGL\", \"Natural Gas\", \"Cement\") # our desired order\n\nemis |>\n filter(year >= 1900) |>\n mutate(\n commodity = if_else(str_detect(commodity, \"Coal\"), \"Coal\", commodity),\n commodity = factor(commodity, levels = LEVS) # re-order \n ) |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9) + # drop the opacity just a touch\n\n # add theme and colors (love you, ggthemes) \n ggthemes::scale_fill_colorblind() +\n ggthemes::theme_fivethirtyeight()\n\n\n\n\n\n\n\n\n\nAnd just like that, it feels like we are almost there! Let’s change a few things at once. We will change the background color, add the title/subtitle/axis labels/caption, and format the \\(y\\)-axis to read 30k instead of 30000. That will give us a feel for the final color scheme and how the fonts feel on the page. Because of the subscript “2” in \\(CO_2\\), I will use the latex2exp package use \\(\\LaTeX\\) typesetting in the plot.\n\n\n\n\n\n\nNote\n\n\n\nOne note that is unique to this plot. When we use theme_fivethirtyeight, it removes the \\(y\\)-axis title. So, although we normally wouldn’t have to explicitly set the axis title to element_text in the theme function, we will here.\n\n\n\n\nClick here for code\nLEVS <- c(\"Coal\", \"Oil & NGL\", \"Natural Gas\", \"Cement\")\nBG_COLOR <- \"#F0F0F0\" # this will be our background color\n\nemis |>\n filter(year >= 1900) |>\n mutate(\n commodity = if_else(str_detect(commodity, \"Coal\"), \"Coal\", commodity),\n commodity = factor(commodity, levels = LEVS) \n ) |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9) +\n\n ggthemes::scale_fill_colorblind() +\n ggthemes::theme_fivethirtyeight() +\n\n # abbreviate the y axis labels using the scales package\n scale_y_continuous(label = scales::label_number(scale = 1e-3, suffix = \"k\")) +\n\n # add labels to the plot -----------------------------------------------------\n labs(\n x = element_blank(),\n y = latex2exp::TeX(\"Emissions ($MtCO_2e$)\"), # LaTeX typesetting with TeX()\n title = \"Our World in Emissions\",\n subtitle = latex2exp::TeX(\n paste(\n \"Emissions are measured in Millions of Tons of $CO_2$ equivalent\",\n \"($MtCO_2e$)\"\n )\n ),\n caption = paste(\n \"Made with love by Mitch Harrison\",\n # long blank line to \"hack\" a an annotation in the bottom-left corner\n \" \",\n \"Source: Carbon Majors database and TidyTuesday\"\n )\n ) +\n theme(\n axis.title.y = element_text(size = 10),\n plot.background = element_rect(fill = BG_COLOR) # change background color\n ) \n\n\n\n\n\n\n\n\n\nYou could submit this plot for public consumption without shame, but we can do better! For example, I think we could safely remove the legend by annotating the colors directly on the plot. Let’s use a geom_text to do just that. While this entire process has been creative, we are getting into highly subjective territory here. So if you don’t like these changes, do something else! I would love to see your ideas.\nTo make the annotations, I want the text to be right-justified and directly atop one another. To accomplish that, I will give geom_text a single \\(x\\) value but several \\(y\\) values (one for each category).\n\n\nClick here for code\nLEVS <- c(\"Coal\", \"Oil & NGL\", \"Natural Gas\", \"Cement\")\nBG_COLOR <- \"#F0F0F0\"\n\nemis |>\n filter(year >= 1900) |>\n mutate(\n commodity = if_else(str_detect(commodity, \"Coal\"), \"Coal\", commodity),\n commodity = factor(commodity, levels = LEVS) \n ) |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9) +\n\n ggthemes::scale_fill_colorblind() +\n ggthemes::theme_fivethirtyeight() +\n scale_y_continuous(label = scales::label_number(scale = 1e-3, suffix = \"k\")) +\n\n # add annotation text to replace the legend ----------------------------------\n annotate(\n geom = \"text\",\n color = \"white\",\n x = 2020,\n y = c(1000, 4700, 13000, 26000),\n label = c(\"Cement\", \"Natural Gas\", \"Oil & NGL\", \"Coal\"),\n hjust = \"right\",\n fontface = \"bold\"\n ) +\n\n labs(\n x = element_blank(),\n y = latex2exp::TeX(\"Emissions ($MtCO_2e$)\"),\n title = \"Our World in Emissions\",\n subtitle = latex2exp::TeX(\n paste(\n \"Emissions are measured in Millions of Tons of $CO_2$ equivalent\",\n \"($MtCO_2e$)\"\n )\n ),\n caption = paste(\n \"Made with love by Mitch Harrison\",\n \" \",\n \"Source: Carbon Majors database and TidyTuesday\"\n )\n ) +\n\n theme(\n legend.position = \"none\", # hide the legend\n axis.title.y = element_text(size = 10),\n plot.background = element_rect(fill = BG_COLOR)\n ) \n\n\n\n\n\n\n\n\n\nNailed it. Now, I will happily take criticism here. I don’t love that the “Cement” label isn’t entirely encompassed by its data. But I think it’s much cleaner than having a legend drawing our eye away from the plot, so I’ll keep it.\nThe last thing we have to do before we can worry about the big annotation in the middle of the plot is change where the axes break. That is, set the years and emission amount displayed on the x and y axes, respectively. And while I’m at it, I will use a geom_hline to make the \\(x\\)-axis a bit bolder since it melts into the background a little bit too much for my liking.\n\n\nClick here for code\nLEVS <- c(\"Coal\", \"Oil & NGL\", \"Natural Gas\", \"Cement\")\nBG_COLOR <- \"#F0F0F0\"\nGRAY <- \"gray35\"\n\nemis |>\n filter(year >= 1900) |>\n mutate(\n commodity = if_else(str_detect(commodity, \"Coal\"), \"Coal\", commodity),\n commodity = factor(commodity, levels = LEVS) \n ) |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9) +\n\n ggthemes::scale_fill_colorblind() +\n ggthemes::theme_fivethirtyeight() +\n\n # change where the axis breaks occur -----------------------------------------\n scale_x_continuous(breaks = seq(1900, 2020, 20)) +\n scale_y_continuous(\n breaks = seq(0, 40000, 5000), \n label = scales::label_number(scale = 1e-3, suffix = \"k\")\n ) +\n\n annotate(\n geom = \"text\",\n color = \"white\",\n x = 2020,\n y = c(1000, 4700, 13000, 26000),\n label = c(\"Cement\", \"Natural Gas\", \"Oil & NGL\", \"Coal\"),\n hjust = \"right\",\n fontface = \"bold\"\n ) +\n\n labs(\n x = element_blank(),\n y = latex2exp::TeX(\"Emissions ($MtCO_2e$)\"),\n title = \"Our World in Emissions\",\n subtitle = latex2exp::TeX(\n paste(\n \"Emissions are measured in Millions of Tons of $CO_2$ equivalent\",\n \"($MtCO_2e$)\"\n )\n ),\n caption = paste(\n \"Made with love by Mitch Harrison\",\n \" \",\n \"Source: Carbon Majors database and TidyTuesday\"\n )\n ) +\n\n geom_hline(yintercept = 0, linewidth = 0.7, color = GRAY) + # bold axis\n theme(\n legend.position = \"none\", # hide the legend\n axis.title.y = element_text(size = 10),\n plot.background = element_rect(fill = BG_COLOR)\n ) \n\n\n\n\n\n\n\n\n\nOnce I write-in the line breaks, I’ll use the annotate function as before. But that’s not all. By default, there is no background with text annotations, so the grid overlaps the text and decreases legibility. To fix this, I’ll use annotate to put a rectangle the same color as the plot background behind the text, which “removes” the grid lines behind the text.\nFinally, to accomplish the arrow, we will use our final annotate to draw a line segment and put an arrowhead at the end.\n\n\n\n\n\n\nNote\n\n\n\nNormally, the order that we put things in a ggplot2 pipeline doesn’t matter. But here, if you put the background rectangle after the text annotation, it will cover the text, rendering it invisible.\n\n\nBecause this is our last edit, I will take this opportunity to make one very oft-forgotten change: write my alt text. Since you’re here, I know you respect the power of data communication. Alt text lets us communicate with those who sometimes miss out on learning from plots online. As our color palette did for colorblind viewers, we owe it to our non-sighted friends to let them participate.\nAnd finally, I’ll change the aspect ratio of the plot. You may have heard of the golden ratio, which is a ratio that many humans find inherently satisfying to look at. That ratio is approximately 1.618:1. The inverse of that number is 0.618, which will be our horizontal aspect ratio (1.618 is vertical). Because the quarto headers won’t render with the document, my final header is below:\n#| label: plt-final\n#| fig-width: 8\n#| fig-align: \"center\"\n#| fig-asp: 0.618\n#| fig-alt: |\n#| This plot is titled Our World in Emissions. It is an area plot that shows\n#| global emissions over time by type. The types are coal, natural gas,\n#| cement, and oil and NGL. The plot notes that in 1995, the UN first met to\n#| discuss the climate threat. The plot shows near-zero emissions from 1900 to\n#| 1920, when a slow increase begins. From there, emission growth seems to be\n#| exponentially increasing, with no decline since the UN first met. Coal is\n#| the largest emitter, then oil and NGL, then natural gas, and finally,\n#| cement.\nNow, let’s see the plot!\n\n\nClick here for code\n# constants for ease of code legibility ----------------------------------------\nLEVS <- c(\"Coal\", \"Oil & NGL\", \"Natural Gas\", \"Cement\")\nBG_COLOR <- \"#F0F0F0\"\nUN_TEXT <- paste(\n \"In 1995, the United Nations\\nConference of the Parties met for\\nthe first\", \n \"time to discuss the looming\\nthreat of climate change. The COP\\nhas\",\n \"met twenty-eight times since.\"\n)\n\n# data cleanup -----------------------------------------------------------------\nemis |>\n filter(year >= 1900) |> # lots of near-zero space without this filter\n mutate(\n commodity = if_else(str_detect(commodity, \"Coal\"), \"Coal\", commodity),\n commodity = factor(commodity, levels = LEVS) # re-order areas\n ) |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9) +\n \n # UN COP annotation text box -------------------------------------------------\n\n # the arrow\n annotate(\n geom = \"segment\",\n x = 1995,\n xend = 1995,\n y = 35500,\n yend = 20500,\n linetype = \"solid\",\n linejoin = \"round\",\n linewidth = 1,\n color = \"grey35\",\n arrow = arrow(type = \"closed\", length = unit(0.2, \"cm\"))\n ) +\n\n # the background rectangle (must be before the text)\n annotate(\n geom = \"rect\",\n xmin = 1945.5,\n xmax = 1993.5,\n ymin = 23500,\n ymax = 35800,\n fill = BG_COLOR\n ) +\n\n # annotation text\n annotate(\n geom = \"text\",\n x = 1992,\n y = 30000,\n label = UN_TEXT,\n color = GRAY,\n fontface = \"italic\",\n hjust = \"right\"\n ) +\n \n # replace legend with annotation text ----------------------------------------\n annotate(\n geom = \"text\",\n color = \"white\",\n x = 2020,\n y = c(1000, 4700, 13000, 26000),\n label = c(\"Cement\", \"Natural Gas\", \"Oil & NGL\", \"Coal\"),\n hjust = \"right\",\n fontface = \"bold\"\n ) +\n \n # visual style elements (love you, ggthemes) ---------------------------------\n ggthemes::scale_fill_colorblind() +\n ggthemes::theme_fivethirtyeight() +\n \n # customize axis breaks and labels -------------------------------------------\n scale_x_continuous(breaks = seq(1900, 2020, 20)) +\n scale_y_continuous(\n breaks = seq(0, 40000, 5000), \n label = scales::label_number(scale = 1e-3, suffix = \"k\")\n ) +\n labs(\n x = element_blank(),\n y = latex2exp::TeX(\"Emissions ($MtCO_2e$)\"),\n title = \"Our World in Emissions\",\n subtitle = latex2exp::TeX(\n paste(\n \"Emissions are measured in Millions of Tons of $CO_2$ equivalent\",\n \"($MtCO_2e$)\"\n )\n ),\n caption = paste(\n \"Made with love by Mitch Harrison\",\n \" \",\n \"Source: Carbon Majors database and TidyTuesday\"\n )\n ) + \n \n # theme cleanup --------------------------------------------------------------\n geom_hline(yintercept = 0, linewidth = 0.7, color = GRAY) + # bold axis\n theme(\n legend.position = \"none\", \n axis.title.y = element_text(size = 10),\n plot.background = element_rect(fill = BG_COLOR)\n ) \n\n\n\n\n\n\n\n\n\nNo plot is perfect, but I am happy with what we have accomplished, and I hope you are too! If you have any questions or corrections, feel free to reach out on Discord, and I’ll be happy to help. And, of course, if you want to contribute to this effort financially, you are more than welcome to buy me a coffee.\nThanks for sticking around, and good luck with your TidyTuesday adventures!" + "text": "Welcome! If you saw my post for this week’s TidyTuesday, I’m glad you liked it enough to learn from it! If not, you can either scroll to the bottom to see the final product or click here. For this plot, we will use an area plot to visualize the global emissions by type going back to 1900. To start, we will use a bare-bones ggplot2 area chart with no bells or whistles to see what we are working with.\n\n\nClick here for code\nlibrary(tidyverse)\n\n# read data and rename an ugly column ------------------------------------------\nemis<- read_csv(paste0(\n \"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/\",\n \"data/2024/2024-05-21/emissions.csv\"\n )\n)\n\nemis <- emis |>\n rename(emissions = \"total_emissions_MtCO2e\")\n\nemis |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9)\n\n\n\n\n\n\n\n\n\nOkay, we’ve learned a lot. First, there are a lot of categories. A good rule of thumb is that once you get to about seven colors, even non-colorblind humans struggle to differentiate. But there is hope! Notice that there are several types of coal production. Let’s aggregate them. Second, there is a long tail on the left because of near-zero data. Let’s bring our limit to the right to get a better look.\n\n\nClick here for code\nemis |>\n filter(year >= 1900) |> # get rid of that tail\n mutate(\n # aggregate coal\n commodity = if_else(str_detect(commodity, \"Coal\"), \"Coal\", commodity),\n ) |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9)\n\n\n\n\n\n\n\n\n\nMuch better! But to me, having the smallest category (cement) on top feels awkward. Let’s reorder the categories! I’ll do so in descending order of emissions in the last year.\n\n\nClick here for code\nLEVS <- c(\"Coal\", \"Oil & NGL\", \"Natural Gas\", \"Cement\") # our desired order\n\nemis |>\n filter(year >= 1900) |>\n mutate(\n commodity = if_else(str_detect(commodity, \"Coal\"), \"Coal\", commodity),\n commodity = factor(commodity, levels = LEVS) # re-order \n ) |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9)\n\n\n\n\n\n\n\n\n\nNow we’re cooking! It’s time for some style points. I’ll use my favorite aesthetic cheat code: ggthemes. Let’s add a theme and color scheme. I’m going with the FiveThirtyEight theme and a colorblind-friendly palette. I’ll also take this opportunity to adjust the opacity down just a touch. This is a personal choice, but I find it nice to be able to see the grid behind such ink-heavy plots as area plots.\n\n\n\n\n\n\nImportant\n\n\n\nRemember: unless you are making plots for a very small number of people and you know for certain that none are colorblind, making inaccessible plots is inexcusable. Of course, we all make mistakes, so if you ever notice an accessibility issue on my site, reach out and let me know on Discord or via a GitHub issue so I can improve for next time!\n\n\n\n\nClick here for code\nLEVS <- c(\"Coal\", \"Oil & NGL\", \"Natural Gas\", \"Cement\") # our desired order\n\nemis |>\n filter(year >= 1900) |>\n mutate(\n commodity = if_else(str_detect(commodity, \"Coal\"), \"Coal\", commodity),\n commodity = factor(commodity, levels = LEVS) # re-order \n ) |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9) + # drop the opacity just a touch\n\n # add theme and colors (love you, ggthemes) \n ggthemes::scale_fill_colorblind() +\n ggthemes::theme_fivethirtyeight()\n\n\n\n\n\n\n\n\n\nAnd just like that, it feels like we are almost there! Let’s change a few things at once. We will change the background color, add the title/subtitle/axis labels/caption, and format the \\(y\\)-axis to read 30k instead of 30000. That will give us a feel for the final color scheme and how the fonts feel on the page. Because of the subscript “2” in \\(CO_2\\), I will use the latex2exp package use \\(\\LaTeX\\) typesetting in the plot.\n\n\n\n\n\n\nNote\n\n\n\nOne note that is unique to this plot. When we use theme_fivethirtyeight, it removes the \\(y\\)-axis title. So, although we normally wouldn’t have to explicitly set the axis title to element_text in the theme function, we will here.\n\n\n\n\nClick here for code\nLEVS <- c(\"Coal\", \"Oil & NGL\", \"Natural Gas\", \"Cement\")\nBG_COLOR <- \"#F0F0F0\" # this will be our background color\n\nemis |>\n filter(year >= 1900) |>\n mutate(\n commodity = if_else(str_detect(commodity, \"Coal\"), \"Coal\", commodity),\n commodity = factor(commodity, levels = LEVS) \n ) |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9) +\n\n ggthemes::scale_fill_colorblind() +\n ggthemes::theme_fivethirtyeight() +\n\n # abbreviate the y axis labels using the scales package\n scale_y_continuous(label = scales::label_number(scale = 1e-3, suffix = \"k\")) +\n\n # add labels to the plot -----------------------------------------------------\n labs(\n x = element_blank(),\n y = latex2exp::TeX(\"Emissions ($MtCO_2e$)\"), # LaTeX typesetting with TeX()\n title = \"Our World in Emissions\",\n subtitle = latex2exp::TeX(\n paste(\n \"Emissions are measured in Millions of Tons of $CO_2$ equivalent\",\n \"($MtCO_2e$)\"\n )\n ),\n caption = paste(\n \"Made with love by Mitch Harrison\",\n # long blank line to \"hack\" a an annotation in the bottom-left corner\n \" \",\n \"Source: Carbon Majors database and TidyTuesday\"\n )\n ) +\n theme(\n axis.title.y = element_text(size = 10),\n plot.background = element_rect(fill = BG_COLOR) # change background color\n ) \n\n\n\n\n\n\n\n\n\nYou could submit this plot for public consumption without shame, but we can do better! For example, I think we could safely remove the legend by annotating the colors directly on the plot. Let’s use a geom_text to do just that. While this entire process has been creative, we are getting into highly subjective territory here. So if you don’t like these changes, do something else! I would love to see your ideas.\nTo make the annotations, I want the text to be right-justified and directly atop one another. To accomplish that, I will give geom_text a single \\(x\\) value but several \\(y\\) values (one for each category).\n\n\nClick here for code\nLEVS <- c(\"Coal\", \"Oil & NGL\", \"Natural Gas\", \"Cement\")\nBG_COLOR <- \"#F0F0F0\"\n\nemis |>\n filter(year >= 1900) |>\n mutate(\n commodity = if_else(str_detect(commodity, \"Coal\"), \"Coal\", commodity),\n commodity = factor(commodity, levels = LEVS) \n ) |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9) +\n\n ggthemes::scale_fill_colorblind() +\n ggthemes::theme_fivethirtyeight() +\n scale_y_continuous(label = scales::label_number(scale = 1e-3, suffix = \"k\")) +\n\n # add annotation text to replace the legend ----------------------------------\n annotate(\n geom = \"text\",\n color = \"white\",\n x = 2020,\n y = c(1000, 4700, 13000, 26000),\n label = c(\"Cement\", \"Natural Gas\", \"Oil & NGL\", \"Coal\"),\n hjust = \"right\",\n fontface = \"bold\"\n ) +\n\n labs(\n x = element_blank(),\n y = latex2exp::TeX(\"Emissions ($MtCO_2e$)\"),\n title = \"Our World in Emissions\",\n subtitle = latex2exp::TeX(\n paste(\n \"Emissions are measured in Millions of Tons of $CO_2$ equivalent\",\n \"($MtCO_2e$)\"\n )\n ),\n caption = paste(\n \"Made with love by Mitch Harrison\",\n \" \",\n \"Source: Carbon Majors database and TidyTuesday\"\n )\n ) +\n\n theme(\n legend.position = \"none\", # hide the legend\n axis.title.y = element_text(size = 10),\n plot.background = element_rect(fill = BG_COLOR)\n ) \n\n\n\n\n\n\n\n\n\nNailed it. Now, I will happily take criticism here. I don’t love that the “Cement” label isn’t entirely encompassed by its data. But I think it’s much cleaner than having a legend drawing our eye away from the plot, so I’ll keep it.\nThe last thing we have to do before we can worry about the big annotation in the middle of the plot is change where the axes break. That is, set the years and emission amount displayed on the x and y axes, respectively. And while I’m at it, I will use a geom_hline to make the \\(x\\)-axis a bit bolder since it melts into the background a little bit too much for my liking.\n\n\nClick here for code\nLEVS <- c(\"Coal\", \"Oil & NGL\", \"Natural Gas\", \"Cement\")\nBG_COLOR <- \"#F0F0F0\"\nGRAY <- \"gray35\"\n\nemis |>\n filter(year >= 1900) |>\n mutate(\n commodity = if_else(str_detect(commodity, \"Coal\"), \"Coal\", commodity),\n commodity = factor(commodity, levels = LEVS) \n ) |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9) +\n\n ggthemes::scale_fill_colorblind() +\n ggthemes::theme_fivethirtyeight() +\n\n # change where the axis breaks occur -----------------------------------------\n scale_x_continuous(breaks = seq(1900, 2020, 20)) +\n scale_y_continuous(\n breaks = seq(0, 40000, 5000), \n label = scales::label_number(scale = 1e-3, suffix = \"k\")\n ) +\n\n annotate(\n geom = \"text\",\n color = \"white\",\n x = 2020,\n y = c(1000, 4700, 13000, 26000),\n label = c(\"Cement\", \"Natural Gas\", \"Oil & NGL\", \"Coal\"),\n hjust = \"right\",\n fontface = \"bold\"\n ) +\n\n labs(\n x = element_blank(),\n y = latex2exp::TeX(\"Emissions ($MtCO_2e$)\"),\n title = \"Our World in Emissions\",\n subtitle = latex2exp::TeX(\n paste(\n \"Emissions are measured in Millions of Tons of $CO_2$ equivalent\",\n \"($MtCO_2e$)\"\n )\n ),\n caption = paste(\n \"Made with love by Mitch Harrison\",\n \" \",\n \"Source: Carbon Majors database and TidyTuesday\"\n )\n ) +\n\n geom_hline(yintercept = 0, linewidth = 0.7, color = GRAY) + # bold axis\n theme(\n legend.position = \"none\", # hide the legend\n axis.title.y = element_text(size = 10),\n plot.background = element_rect(fill = BG_COLOR)\n ) \n\n\n\n\n\n\n\n\n\nOnce I write-in the line breaks, I’ll use the annotate function as before. But that’s not all. By default, there is no background with text annotations, so the grid overlaps the text and decreases legibility. To fix this, I’ll use annotate to put a rectangle the same color as the plot background behind the text, which “removes” the grid lines behind the text.\nFinally, to accomplish the arrow, we will use our final annotate to draw a line segment and put an arrowhead at the end.\n\n\n\n\n\n\nNote\n\n\n\nNormally, the order that we put things in a ggplot2 pipeline doesn’t matter. But here, if you put the background rectangle after the text annotation, it will cover the text, rendering it invisible.\n\n\nBecause this is our last edit, I will take this opportunity to make one very oft-forgotten change: write my alt text. Since you’re here, I know you respect the power of data communication. Alt text lets us communicate with those who sometimes miss out on learning from plots online. As our color palette did for colorblind viewers, we owe it to our non-sighted friends to let them participate.\nAnd finally, I’ll change the aspect ratio of the plot. You may have heard of the golden ratio, which is a ratio that many humans find inherently satisfying to look at. That ratio is approximately 1.618:1. The inverse of that number is 0.618, which will be our horizontal aspect ratio (1.618 is vertical). Because the quarto headers won’t render with the document, my final header is below:\n#| label: plt-final\n#| fig-width: 8\n#| fig-align: \"center\"\n#| fig-asp: 0.618\n#| fig-alt: |\n#| This plot is titled Our World in Emissions. It is an area plot that shows\n#| global emissions over time by type. The types are coal, natural gas,\n#| cement, and oil and NGL. The plot notes that in 1995, the UN first met to\n#| discuss the climate threat. The plot shows near-zero emissions from 1900 to\n#| 1920, when a slow increase begins. From there, emission growth seems to be\n#| exponentially increasing, with no decline since the UN first met. Coal is\n#| the largest emitter, then oil and NGL, then natural gas, and finally,\n#| cement.\nNow, let’s see the plot!\n\n\nClick here for code\n# constants for ease of code legibility ----------------------------------------\nLEVS <- c(\"Coal\", \"Oil & NGL\", \"Natural Gas\", \"Cement\")\nBG_COLOR <- \"#F0F0F0\"\nUN_TEXT <- paste(\n \"In 1995, the United Nations\\nConference of the Parties met for\\nthe first\", \n \"time to discuss the looming\\nthreat of climate change. The COP\\nhas\",\n \"met twenty-eight times since.\"\n)\n\n# data cleanup -----------------------------------------------------------------\nemis |>\n filter(year >= 1900) |> # lots of near-zero space without this filter\n mutate(\n commodity = if_else(str_detect(commodity, \"Coal\"), \"Coal\", commodity),\n commodity = factor(commodity, levels = LEVS) # re-order areas\n ) |>\n group_by(year, commodity) |>\n summarise(emissions = sum(emissions), .groups = \"drop\") |>\n \n # start of plot --------------------------------------------------------------\n ggplot(aes(x = year, y = emissions, fill = commodity)) +\n geom_area(alpha = 0.9) +\n \n # UN COP annotation text box -------------------------------------------------\n\n # the arrow\n annotate(\n geom = \"segment\",\n x = 1995,\n xend = 1995,\n y = 35500,\n yend = 20500,\n linetype = \"solid\",\n linejoin = \"round\",\n linewidth = 1,\n color = \"grey35\",\n arrow = arrow(type = \"closed\", length = unit(0.2, \"cm\"))\n ) +\n\n # the background rectangle (must be before the text)\n annotate(\n geom = \"rect\",\n xmin = 1945.5,\n xmax = 1993.5,\n ymin = 23500,\n ymax = 35800,\n fill = BG_COLOR\n ) +\n\n # annotation text\n annotate(\n geom = \"text\",\n x = 1992,\n y = 30000,\n label = UN_TEXT,\n color = GRAY,\n fontface = \"italic\",\n hjust = \"right\"\n ) +\n \n # replace legend with annotation text ----------------------------------------\n annotate(\n geom = \"text\",\n color = \"white\",\n x = 2020,\n y = c(1000, 4700, 13000, 26000),\n label = c(\"Cement\", \"Natural Gas\", \"Oil & NGL\", \"Coal\"),\n hjust = \"right\",\n fontface = \"bold\"\n ) +\n \n # visual style elements (love you, ggthemes) ---------------------------------\n ggthemes::scale_fill_colorblind() +\n ggthemes::theme_fivethirtyeight() +\n \n # customize axis breaks and labels -------------------------------------------\n scale_x_continuous(breaks = seq(1900, 2020, 20)) +\n scale_y_continuous(\n breaks = seq(0, 40000, 5000), \n label = scales::label_number(scale = 1e-3, suffix = \"k\")\n ) +\n labs(\n x = element_blank(),\n y = latex2exp::TeX(\"Emissions ($MtCO_2e$)\"),\n title = \"Our World in Emissions\",\n subtitle = latex2exp::TeX(\n paste(\n \"Emissions are measured in Millions of Tons of $CO_2$ equivalent\",\n \"($MtCO_2e$)\"\n )\n ),\n caption = paste(\n \"Made with love by Mitch Harrison\",\n \" \",\n \"Source: Carbon Majors database and TidyTuesday\"\n )\n ) + \n \n # theme cleanup --------------------------------------------------------------\n geom_hline(yintercept = 0, linewidth = 0.7, color = GRAY) + # bold axis\n theme(\n legend.position = \"none\", \n axis.title.y = element_text(size = 10),\n plot.background = element_rect(fill = BG_COLOR)\n ) \n\n\n\n\n\n\n\n\n\nNo plot is perfect, but I am happy with what we have accomplished, and I hope you are too! If you have any questions or corrections, feel free to reach out on Discord, and I’ll be happy to help. And, of course, if you want to contribute to this effort financially, you are more than welcome to buy me a coffee.\nThanks for sticking around, and good luck with your TidyTuesday adventures!" }, { "objectID": "tutorials/math_stat_2.html", @@ -123,7 +123,7 @@ "href": "tutorials.html", "title": "Tutorials", "section": "", - "text": "Here is a place to browse my tutorials! Feel free to browse or sort by category on the right side of the page. If you have any questions or ideas for new topics, let me know on Discord!\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHello, statistics. | Mathematical Statistics 0\n\n\n\nStatistics\n\n\nMathematical Statistics\n\n\n\n\n\n\n\nMitch Harrison\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWelcome to Estimators! | Mathematical Statistics 1\n\n\n\nStatistics\n\n\nMathematical Statistics\n\n\n\n\n\n\n\nMitch Harrison\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThe Bias-Variance Tradeoff | Mathematical Statistics 2\n\n\n\nStatistics\n\n\nMathematical Statistics\n\n\n\n\n\n\n\nMitch Harrison\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nJensen’s Inequality | Mathematical Statistics 3\n\n\n\nStatistics\n\n\nMathematical Statistics\n\n\n\n\n\n\n\nMitch Harrison\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nOur World in Emissions | TidyTutorial\n\n\n\nData Viz\n\n\n\n\n\n\n\nMitch Harrison\n\n\n\n\n\n\n\n\nNo matching items" + "text": "Here is a place to browse my tutorials! Feel free to browse or sort by category on the right side of the page. If you have any questions or ideas for new topics, let me know on Discord!\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHello, statistics. | Mathematical Statistics 0\n\n\n\nStatistics\n\n\nMathematical Statistics\n\n\n\n\n\n\n\nMitch Harrison\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWelcome to Estimators! | Mathematical Statistics 1\n\n\n\nStatistics\n\n\nMathematical Statistics\n\n\n\n\n\n\n\nMitch Harrison\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThe Bias-Variance Tradeoff | Mathematical Statistics 2\n\n\n\nStatistics\n\n\nMathematical Statistics\n\n\n\n\n\n\n\nMitch Harrison\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nJensen’s Inequality | Mathematical Statistics 3\n\n\n\nStatistics\n\n\nMathematical Statistics\n\n\n\n\n\n\n\nMitch Harrison\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nOur World in Emissions | TidyTutorial\n\n\n\nData Viz\n\n\n\n\n\n\n\nMitch Harrison\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nCampus Pride Index| TidyTutorial\n\n\n\nData Viz\n\n\n\n\n\n\n\nMitch Harrison\n\n\n\n\n\n\n\n\nNo matching items" }, { "objectID": "index.html", @@ -131,5 +131,12 @@ "title": "Hi, I’m Mitch. 👋", "section": "", "text": "After leaving the Navy in 2022, I moved to North Carolina to study data science and political science at Duke University. Finishing my degree cost the military about $60,000 in aid and depleted my savings. Even my parents, who probably thought they could start spending their “put their kid through college” fund on fun things, had to chip in to help. My education was world-class but utterly unattainable for the majority of us. This website is my attempt to turn my course notes, homework, and projects into comprehensible articles for free. If nothing else, future Duke students struggling through their data science degree can look here for a different perspective.\nIf you want to see some of my analysis work, head to the Projects tab! If you are here to learn, the Tutorials tab is for you. Of course, if you are curious about the website’s structure, it is built with R and Quarto, and the code is available by clicking on the GitHub icon at the bottom-left of every page (or click here).\nAccess to information should always be free, so every article here is, and always will be, at no cost. If you want to show financial support, you can buy me a coffee! But I won’t ever make donor-exclusive educational content, so don’t feel like you’re missing out by not donating. It’s just one way to show thanks.\nI hope you enjoy the site, and feel free to reach out via GitHub issues to make suggestions for articles. Thanks for reading!" + }, + { + "objectID": "tutorials/tidytuesday_06112024.html", + "href": "tutorials/tidytuesday_06112024.html", + "title": "Campus Pride Index| TidyTutorial", + "section": "", + "text": "Introduction\nIn celebration of Pride Month, this week’s TidyTuesday provides data from the Campus Pride Index, which measures the safety and inclusivity of LGBTQ+ programs across universities in the United States.\nEach university is binned into one or more categories (e.g., military colleges, private/public, and others). What feels natural to me is to see how the Campus Pride Index compares across some of these categories. A proportionate stacked bar chart (where each bar has height equal to 1) is one option, but I would also like to see which types of universities are most common. If there are some categories with worse scores but with much smaller sample sizes, that would be helpful to know. So we’ll use a stacked bar, but not normalize the bar so we can also see how common each type is. Also bear in mind that a single university can (and often does) fall into multiple categories.\nLet’s set some global settings so I don’t have to worry about aspect ratio or other trivialities while we work.\n\n\nClick here for code\nknitr::opts_chunk$set(\n fig.width = 10, \n fig.asp = 0.618, # the golden ratio\n fig.align = \"center\" # center align figures\n)\n\n\n\n\nData Wrangling\nTime to load the data.\n\n\nClick here for code\nlibrary(tidyverse)\nlibrary(gglgbtq)\nlibrary(ggchicklet)\nlibrary(ggthemes)\nlibrary(DT)\n\n# load the data ----------------------------------------------------------------\n\npride_schools <- read_csv(paste0(\n \"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/\",\n \"2024/2024-06-11/pride_index.csv\"\n))\n\npride_tags <- read_csv(paste0(\n \"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/\",\n \"2024/2024-06-11/pride_index_tags.csv\"\n))\n\ndatatable(left_join(pride_schools, pride_tags))\n\n\n\n\n\n\nFirst, let’s format the data for ease of plotting. Right now, each category has its own column, with TRUE or NA values, where NA means “false” for our purposes. But we want the type of school to be represented in a single column so we can map that column to the color of the bars. To move multiple columns into a single one, we will pivot the data. Since we want to consolidate columns, we will need to make our data longer (i.e., add more rows), where each university now has multiple rows corresponding to TRUE or FALSE. Intuitively, to pivot the data longer, we use the pivot_longer function. Notice that once the pivot is complete, we only want to keep the rows where the value is TRUE, since the FALSE rows are just saying that “this university doesn’t fall into this type,” which is useless noise in our dataset.\n\n\nClick here for code\n# format data for plotting -----------------------------------------------------\n\nuni_types <- pride_schools |>\n \n # join both datasets into one\n left_join(pride_tags) |>\n \n # select which columns we want to analyze along with their ratings\n select(rating, public, private, community, liberal_arts, technical,\n religious, military, hbcu, hispanic_serving, aapi_serving,\n other_minority_serving) |> \n \n # replace NA with FALSE\n mutate(across(everything(), ~ replace_na(., FALSE))) |>\n \n # do the pivot\n pivot_longer(cols = !rating, names_to = \"type\") |>\n \n # drop the rows that don't apply\n filter(value == TRUE) |>\n \n # clean up some strings for prettier plotting\n mutate(\n type = str_replace_all(type, \"_\", \" \"),\n type = str_to_title(type),\n type = str_replace(type, \"Aapi\", \"AAPI\")\n )\n\ndatatable(uni_types)\n\n\n\n\n\n\nThat looks just like we wanted it to. Now that our data is formatting, we can work on the plot. Per the data dictionary on the TidyVerse GitHub repository, we know that fractional scores are possible. A quick call to the unique function told me that the “fractional” scores are only half-stars, not any decimal in between two scores. So 1 and 1.5 are possible scores, but 1.7 is not. We should bin these scores by their leading digit so we have five possible fill values instead of ten. We’ll call these bins rating_levs, or “rating levels.”\nI would also like to order the bars in descending order of the total number of universities of that type. To do that, we’ll count how many of each category there are and save the order as a vector uni_levs, or “university levels.”\n\n\nClick here for code\nuni_levs <- uni_types |>\n group_by(type) |>\n summarise(count = n()) |>\n arrange(desc(count)) |>\n pull(type)\n\nrating_levs <- c(\"1 - 1.5\", \"2 - 2.5\", \"3 - 3.5\", \"4 - 4.5\", \"5\")\n\n\nFor our last data wrangling step, we can assign the rating bins to their respective ratings. I’ll create a new column for this and call it score. After that, we can group by type of university and score, and count the number of occurrences of each group. Then, we’ll be ready to plot.\n\n\nClick here for code\nuni_types <- uni_types |>\n \n # assign bins to the score variable\n mutate(\n score = case_when(\n rating < 2 ~ rating_levs[1],\n rating < 3 ~ rating_levs[2],\n rating < 4 ~ rating_levs[3],\n rating < 5 ~ rating_levs[4],\n TRUE ~ rating_levs[5] \n ),\n \n # order the score bins using the rating_levs we made earlier\n score = factor(score, levels = rating_levs)\n ) |>\n \n # count the number of each type/score combination\n group_by(type, score) |>\n summarise(count = n(), .groups = \"drop\") |>\n \n # reorder the universities by descending order of number\n mutate(type = factor(type, levels = rev(uni_levs)))\n\n\n\n\nThe Plot\nNow that we have our data, let’s get the skeleton of the plot going. I’m going to use a favorite “cheat code” of mine for making aesthetically pleasing bar graphs in R: the ggchicklet package. It lets us round the corners of each bar, which gives a much more aesthetic appearance (in my opinion). So, instead of using the geom_col function that is standard, we will use geom_chicklet instead.\nOne small note for geom_chicklet: it prefers to have bar graphs be vertical. But because my category names are long (and you never want to rotate text), I would like the plot to be horizontal. So I’ll map the type to the x axis and the counts to the y axis like geom_chicklet prefers, but I’ll use coord_flip afterwards to make it horizontal. This is the same technique that the author of the ggchicklet package uses in his demo on the ggchicklet GitHub repository.\n\n\nClick here for code\nuni_types |>\n ggplot(aes(x = type, y = count, fill = score)) +\n geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) +\n coord_flip()\n\n\n\n\n\n\n\n\n\nThis is already a great start! We have some aesthetic changes to make, but our bins and bars are in the order that we were hoping. Let’s change some colors.\nFirst, I’ll use my favorite theme function, which comes from the ggthemes package. That theme is theme_fivethirtyeight, which takes its name from the legendary data visualizations of the FiveThirtyEight website.\nI also think it would be appropriate for us to use Pride colors, don’t you? Of course, there is an R package for that: the gglgbtq package, which I imported earlier. We will use the “rainbow” color palette provided by gglgbtq to color our bars.\n\n\nClick here for code\nuni_types |>\n ggplot(aes(x = type, y = count, fill = score)) +\n geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) +\n coord_flip() +\n \n # change theme and base font size\n theme_fivethirtyeight() +\n \n # change bar colors and put the legend in the right order\n scale_fill_manual(values = palette_lgbtq(\"rainbow\"))\n\n\n\n\n\n\n\n\n\nNow we’re cooking! I think we are safe to add the title and subtitle, and then we can make a few more aesthetic changes before wrapping up. I want the background to be black (personal preference), which means the text needs to be white. I also don’t think that horizontal grid lines are necessary when the y axis is discrete, so we will remove those. I love the legend, but I would like it to be stacked and placed vertically in the plot, rather than horizontal and below the plot.\n\n\nClick here for code\nuni_types |>\n ggplot(aes(x = type, y = count, fill = score)) +\n geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) +\n coord_flip() +\n theme_fivethirtyeight() +\n scale_fill_manual(values = palette_lgbtq(\"rainbow\")) +\n \n # add title and subtitle\n labs(\n title = \"Campus Pride Index Scores\",\n subtitle = \"Higher scores mean increased LGBTQ-inclusive policies/programs\",\n ) +\n \n theme(\n # make all text white\n text = element_text(color = \"white\", family = \"Lato\") ,\n \n # adjust title font size\n plot.title = element_text(),\n \n # make background black\n plot.background = element_rect(fill = \"black\"),\n panel.background = element_rect(fill = \"black\"),\n legend.background = element_rect(fill = \"black\"),\n \n # remove grid lines\n panel.grid.major.y = element_blank(),\n \n # move legend\n legend.direction = \"vertical\",\n legend.position = c(0.9, 0.5),\n )\n\n\n\n\n\n\n\n\n\nMuch better! Only a few small edits left. First, I don’t think the legend needs a title. I also want the higher scores to be higher on the legend, so we can reverse the order of the legend inside of the scale_fill_manual function. The y axis text is a little far from the axis for my liking, so we will shift that in, and we’ll be done, save for one more thing: fonts.\nI’m going to use custom fonts that aren’t shipped with R or ggplot. These fonts come from Google Fonts, and we will need to use two packages to get them to work: sysfonts to load fonts from Google and showtext to get them to work with our plots. Once we import them, we can use them like any other font in our ggplot graphs!\n\n\nClick here for code\n# load fonts from Google Fonts into our project\nsysfonts::font_add_google(name = \"Galada\")\nsysfonts::font_add_google(name = \"Lato\")\nshowtext::showtext_auto()\n\nuni_types |>\n ggplot(aes(x = type, y = count, fill = score)) +\n geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) +\n coord_flip() +\n theme_fivethirtyeight(base_size = 25) +\n scale_fill_manual(\n values = palette_lgbtq(\"rainbow\"),\n guide = guide_legend(reverse = TRUE) # reverse the legend order\n ) +\n labs(\n title = \"Campus Pride Index Scores\",\n subtitle = \"Higher scores mean increased LGBTQ-inclusive policies/programs\",\n ) +\n theme(\n # use Lato font from Google for all text\n text = element_text(color = \"white\", family = \"Lato\") ,\n \n # use Galada font from Google just for the title\n plot.title = element_text(family = \"Galada\", size = 43),\n \n plot.background = element_rect(fill = \"black\"),\n panel.background = element_rect(fill = \"black\"),\n panel.grid.major.y = element_blank(),\n legend.background = element_rect(fill = \"black\"),\n legend.title = element_blank(),\n legend.direction = \"vertical\",\n legend.position = c(0.9, 0.5),\n \n # shift y axis text closer to the margin\n axis.text.y = element_text(margin = margin(r = -20))\n )\n\n\n\n\n\n\n\n\n\n\n\nConclusion\nDone! With some data wrangling and some nice themes, we have arrived of a graph that we can be proud of (get it?). I hope this helps you in your own data viz journey, but if you have further questions, feel free to join my Discord server and ask me personally! And if you are feeling grateful for my work (and are financially able to), you can give me a special thanks by buying me a coffee.\nAs always, thanks for reading, and see you next week!" } ] \ No newline at end of file diff --git a/docs/sitemap.xml b/docs/sitemap.xml index f2fda0c..af89115 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -6,11 +6,11 @@ https://mitchellharrison.github.io/projects/tidytuesday_06112024/pride.html - 2024-06-11T20:29:38.184Z + 2024-06-11T22:10:50.833Z https://mitchellharrison.github.io/tutorials/tidytuesday_05212024.html - 2024-05-26T03:48:50.010Z + 2024-06-11T22:10:10.600Z https://mitchellharrison.github.io/tutorials/math_stat_2.html @@ -44,4 +44,8 @@ https://mitchellharrison.github.io/index.html 2024-05-26T19:37:00.534Z + + https://mitchellharrison.github.io/tutorials/tidytuesday_06112024.html + 2024-06-11T22:46:41.752Z + diff --git a/docs/tutorials.html b/docs/tutorials.html index 5001ae0..79bcc7d 100644 --- a/docs/tutorials.html +++ b/docs/tutorials.html @@ -174,7 +174,7 @@ +
Categories
All (6)
Data Viz (2)
Mathematical Statistics (4)
Statistics (4)
@@ -324,7 +324,7 @@
-
No matching items diff --git a/docs/tutorials.xml b/docs/tutorials.xml index 03352f3..0cda829 100644 --- a/docs/tutorials.xml +++ b/docs/tutorials.xml @@ -10,7 +10,7 @@ quarto-1.4.554 -Tue, 11 Jun 2024 20:30:14 GMT +Tue, 11 Jun 2024 22:46:54 GMT Hello, statistics. | Mathematical Statistics 0 Mitch Harrison @@ -49,7 +49,7 @@ Statistics Mathematical Statistics https://mitchellharrison.github.io/tutorials/math_stat_0.html - Tue, 11 Jun 2024 20:30:14 GMT + Tue, 11 Jun 2024 22:46:54 GMT @@ -138,7 +138,7 @@ Definition Statistics Mathematical Statistics https://mitchellharrison.github.io/tutorials/math_stat_1.html - Tue, 11 Jun 2024 20:30:14 GMT + Tue, 11 Jun 2024 22:46:54 GMT @@ -620,7 +620,7 @@ font-style: inherit;">element_line() Statistics Mathematical Statistics https://mitchellharrison.github.io/tutorials/math_stat_2.html - Tue, 11 Jun 2024 20:30:14 GMT + Tue, 11 Jun 2024 22:46:54 GMT @@ -685,7 +685,7 @@ Definition Statistics Mathematical Statistics https://mitchellharrison.github.io/tutorials/math_stat_3.html - Tue, 11 Jun 2024 20:30:14 GMT + Tue, 11 Jun 2024 22:46:54 GMT @@ -697,7 +697,7 @@ Definition -

Welcome! If you saw my post for this week’s TidyTuesday, I’m glad you liked it enough to learn from it! If not, you can either scroll to the bottom to see the final product or click here to see it. For this plot, we will use an area plot to visualize the global emissions by type going back to 1900. To start, we will use a bare-bones ggplot2 area chart with no bells or whistles to see what we are working with.

+

Welcome! If you saw my post for this week’s TidyTuesday, I’m glad you liked it enough to learn from it! If not, you can either scroll to the bottom to see the final product or click here. For this plot, we will use an area plot to visualize the global emissions by type going back to 1900. To start, we will use a bare-bones ggplot2 area chart with no bells or whistles to see what we are working with.

Click here for code @@ -2606,8 +2606,969 @@ font-style: inherit;">fill = BG_COLOR) ]]> Data Viz https://mitchellharrison.github.io/tutorials/tidytuesday_05212024.html - Tue, 11 Jun 2024 20:30:14 GMT + Tue, 11 Jun 2024 22:46:54 GMT + + Campus Pride Index| TidyTutorial + Mitch Harrison + https://mitchellharrison.github.io/tutorials/tidytuesday_06112024.html + +

Introduction

+

In celebration of Pride Month, this week’s TidyTuesday provides data from the Campus Pride Index, which measures the safety and inclusivity of LGBTQ+ programs across universities in the United States.

+

Each university is binned into one or more categories (e.g., military colleges, private/public, and others). What feels natural to me is to see how the Campus Pride Index compares across some of these categories. A proportionate stacked bar chart (where each bar has height equal to 1) is one option, but I would also like to see which types of universities are most common. If there are some categories with worse scores but with much smaller sample sizes, that would be helpful to know. So we’ll use a stacked bar, but not normalize the bar so we can also see how common each type is. Also bear in mind that a single university can (and often does) fall into multiple categories.

+

Let’s set some global settings so I don’t have to worry about aspect ratio or other trivialities while we work.

+
+
+Click here for code +
knitr::opts_chunk$set(
+  fig.width = 10,        
+  fig.asp = 0.618,      # the golden ratio
+  fig.align = "center"  # center align figures
+)
+
+
+ +
+

Data Wrangling

+

Time to load the data.

+
+
+Click here for code +
library(tidyverse)
+library(gglgbtq)
+library(ggchicklet)
+library(ggthemes)
+library(DT)
+
+# load the data ----------------------------------------------------------------
+
+pride_schools <- read_csv(paste0(
+  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/",
+  "2024/2024-06-11/pride_index.csv"
+))
+
+pride_tags <- read_csv(paste0(
+  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/",
+  "2024/2024-06-11/pride_index_tags.csv"
+))
+
+datatable(left_join(pride_schools, pride_tags))
+
+
+
+ +
+
+

First, let’s format the data for ease of plotting. Right now, each category has its own column, with TRUE or NA values, where NA means “false” for our purposes. But we want the type of school to be represented in a single column so we can map that column to the color of the bars. To move multiple columns into a single one, we will pivot the data. Since we want to consolidate columns, we will need to make our data longer (i.e., add more rows), where each university now has multiple rows corresponding to TRUE or FALSE. Intuitively, to pivot the data longer, we use the pivot_longer function. Notice that once the pivot is complete, we only want to keep the rows where the value is TRUE, since the FALSE rows are just saying that “this university doesn’t fall into this type,” which is useless noise in our dataset.

+
+
+Click here for code +
# format data for plotting -----------------------------------------------------
+
+uni_types <- pride_schools |>
+  
+  # join both datasets into one
+  left_join(pride_tags) |>
+  
+  # select which columns we want to analyze along with their ratings
+  select(rating, public, private, community, liberal_arts, technical,
+         religious, military, hbcu, hispanic_serving, aapi_serving,
+         other_minority_serving) |> 
+  
+  # replace NA with FALSE
+  mutate(across(everything(), ~ replace_na(., FALSE))) |>
+  
+  # do the pivot
+  pivot_longer(cols = !rating, names_to = "type") |>
+  
+  # drop the rows that don't apply
+  filter(value == TRUE) |>
+  
+  # clean up some strings for prettier plotting
+  mutate(
+    type = str_replace_all(type, "_", " "),
+    type = str_to_title(type),
+    type = str_replace(type, "Aapi", "AAPI")
+  )
+
+datatable(uni_types)
+
+
+
+ +
+
+

That looks just like we wanted it to. Now that our data is formatting, we can work on the plot. Per the data dictionary on the TidyVerse GitHub repository, we know that fractional scores are possible. A quick call to the unique function told me that the “fractional” scores are only half-stars, not any decimal in between two scores. So 1 and 1.5 are possible scores, but 1.7 is not. We should bin these scores by their leading digit so we have five possible fill values instead of ten. We’ll call these bins rating_levs, or “rating levels.”

+

I would also like to order the bars in descending order of the total number of universities of that type. To do that, we’ll count how many of each category there are and save the order as a vector uni_levs, or “university levels.”

+
+
+Click here for code +
uni_levs <- uni_types |>
+  group_by(type) |>
+  summarise(count = n()) |>
+  arrange(desc(count)) |>
+  pull(type)
+
+rating_levs <- c("1 - 1.5", "2 - 2.5", "3 - 3.5", "4 - 4.5", "5")
+
+
+

For our last data wrangling step, we can assign the rating bins to their respective ratings. I’ll create a new column for this and call it score. After that, we can group by type of university and score, and count the number of occurrences of each group. Then, we’ll be ready to plot.

+
+
+Click here for code +
uni_types <- uni_types |>
+  
+  # assign bins to the score variable
+  mutate(
+    score = case_when(
+      rating < 2 ~ rating_levs[1],
+      rating < 3 ~ rating_levs[2],
+      rating < 4 ~ rating_levs[3],
+      rating < 5 ~ rating_levs[4],
+      TRUE ~ rating_levs[5] 
+    ),
+    
+    # order the score bins using the rating_levs we made earlier
+    score = factor(score, levels = rating_levs)
+  ) |>
+  
+  # count the number of each type/score combination
+  group_by(type, score) |>
+  summarise(count = n(), .groups = "drop") |>
+  
+  # reorder the universities by descending order of number
+  mutate(type = factor(type, levels = rev(uni_levs)))
+
+
+
+
+

The Plot

+

Now that we have our data, let’s get the skeleton of the plot going. I’m going to use a favorite “cheat code” of mine for making aesthetically pleasing bar graphs in R: the ggchicklet package. It lets us round the corners of each bar, which gives a much more aesthetic appearance (in my opinion). So, instead of using the geom_col function that is standard, we will use geom_chicklet instead.

+

One small note for geom_chicklet: it prefers to have bar graphs be vertical. But because my category names are long (and you never want to rotate text), I would like the plot to be horizontal. So I’ll map the type to the x axis and the counts to the y axis like geom_chicklet prefers, but I’ll use coord_flip afterwards to make it horizontal. This is the same technique that the author of the ggchicklet package uses in his demo on the ggchicklet GitHub repository.

+
+
+Click here for code +
uni_types |>
+  ggplot(aes(x = type, y = count, fill = score)) +
+  geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) +
+  coord_flip()
+
+
+
+
+

+
+
+
+
+

This is already a great start! We have some aesthetic changes to make, but our bins and bars are in the order that we were hoping. Let’s change some colors.

+

First, I’ll use my favorite theme function, which comes from the ggthemes package. That theme is theme_fivethirtyeight, which takes its name from the legendary data visualizations of the FiveThirtyEight website.

+

I also think it would be appropriate for us to use Pride colors, don’t you? Of course, there is an R package for that: the gglgbtq package, which I imported earlier. We will use the “rainbow” color palette provided by gglgbtq to color our bars.

+
+
+Click here for code +
uni_types |>
+  ggplot(aes(x = type, y = count, fill = score)) +
+  geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) +
+  coord_flip() +
+  
+  # change theme and base font size
+  theme_fivethirtyeight() +
+  
+  # change bar colors and put the legend in the right order
+  scale_fill_manual(values = palette_lgbtq("rainbow"))
+
+
+
+
+

+
+
+
+
+

Now we’re cooking! I think we are safe to add the title and subtitle, and then we can make a few more aesthetic changes before wrapping up. I want the background to be black (personal preference), which means the text needs to be white. I also don’t think that horizontal grid lines are necessary when the y axis is discrete, so we will remove those. I love the legend, but I would like it to be stacked and placed vertically in the plot, rather than horizontal and below the plot.

+
+
+Click here for code +
uni_types |>
+  ggplot(aes(x = type, y = count, fill = score)) +
+  geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) +
+  coord_flip() +
+  theme_fivethirtyeight() +
+  scale_fill_manual(values = palette_lgbtq("rainbow")) +
+  
+  # add title and subtitle
+  labs(
+    title = "Campus Pride Index Scores",
+    subtitle = "Higher scores mean increased LGBTQ-inclusive policies/programs",
+  ) +
+  
+  theme(
+    # make all text white
+    text = element_text(color = "white", family = "Lato") ,
+    
+    # adjust title font size
+    plot.title = element_text(),
+    
+    # make background black
+    plot.background = element_rect(fill = "black"),
+    panel.background = element_rect(fill = "black"),
+    legend.background = element_rect(fill = "black"),
+    
+    # remove grid lines
+    panel.grid.major.y = element_blank(),
+    
+    # move legend
+    legend.direction = "vertical",
+    legend.position = c(0.9, 0.5),
+  )
+
+
+
+
+

+
+
+
+
+

Much better! Only a few small edits left. First, I don’t think the legend needs a title. I also want the higher scores to be higher on the legend, so we can reverse the order of the legend inside of the scale_fill_manual function. The y axis text is a little far from the axis for my liking, so we will shift that in, and we’ll be done, save for one more thing: fonts.

+

I’m going to use custom fonts that aren’t shipped with R or ggplot. These fonts come from Google Fonts, and we will need to use two packages to get them to work: sysfonts to load fonts from Google and showtext to get them to work with our plots. Once we import them, we can use them like any other font in our ggplot graphs!

+
+
+Click here for code +
# load fonts from Google Fonts into our project
+sysfonts::font_add_google(name = "Galada")
+sysfonts::font_add_google(name = "Lato")
+showtext::showtext_auto()
+
+uni_types |>
+  ggplot(aes(x = type, y = count, fill = score)) +
+  geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) +
+  coord_flip() +
+  theme_fivethirtyeight(base_size = 25) +
+  scale_fill_manual(
+    values = palette_lgbtq("rainbow"),
+    guide = guide_legend(reverse = TRUE) # reverse the legend order
+  ) +
+  labs(
+    title = "Campus Pride Index Scores",
+    subtitle = "Higher scores mean increased LGBTQ-inclusive policies/programs",
+  ) +
+  theme(
+    # use Lato font from Google for all text
+    text = element_text(color = "white", family = "Lato") ,
+    
+    # use Galada font from Google just for the title
+    plot.title = element_text(family = "Galada", size = 43),
+    
+    plot.background = element_rect(fill = "black"),
+    panel.background = element_rect(fill = "black"),
+    panel.grid.major.y = element_blank(),
+    legend.background = element_rect(fill = "black"),
+    legend.title = element_blank(),
+    legend.direction = "vertical",
+    legend.position = c(0.9, 0.5),
+    
+    # shift y axis text closer to the margin
+    axis.text.y = element_text(margin = margin(r = -20))
+  )
+
+
+
+
+

+
+
+
+
+
+
+

Conclusion

+

Done! With some data wrangling and some nice themes, we have arrived of a graph that we can be proud of (get it?). I hope this helps you in your own data viz journey, but if you have further questions, feel free to join my Discord server and ask me personally! And if you are feeling grateful for my work (and are financially able to), you can give me a special thanks by buying me a coffee.

+

As always, thanks for reading, and see you next week!

+ + +
+ + ]]>
+ Data Viz + https://mitchellharrison.github.io/tutorials/tidytuesday_06112024.html + Tue, 11 Jun 2024 22:46:54 GMT + +
diff --git a/docs/tutorials/tidytuesday_05212024.html b/docs/tutorials/tidytuesday_05212024.html index c044330..84db8a4 100644 --- a/docs/tutorials/tidytuesday_05212024.html +++ b/docs/tutorials/tidytuesday_05212024.html @@ -210,7 +210,7 @@

Our World in Emissions | TidyTutorial

-

Welcome! If you saw my post for this week’s TidyTuesday, I’m glad you liked it enough to learn from it! If not, you can either scroll to the bottom to see the final product or click here to see it. For this plot, we will use an area plot to visualize the global emissions by type going back to 1900. To start, we will use a bare-bones ggplot2 area chart with no bells or whistles to see what we are working with.

+

Welcome! If you saw my post for this week’s TidyTuesday, I’m glad you liked it enough to learn from it! If not, you can either scroll to the bottom to see the final product or click here. For this plot, we will use an area plot to visualize the global emissions by type going back to 1900. To start, we will use a bare-bones ggplot2 area chart with no bells or whistles to see what we are working with.

Click here for code diff --git a/docs/tutorials/tidytuesday_06112024.html b/docs/tutorials/tidytuesday_06112024.html new file mode 100644 index 0000000..cd0bbd5 --- /dev/null +++ b/docs/tutorials/tidytuesday_06112024.html @@ -0,0 +1,926 @@ + + + + + + + + + + +Mitch’s Website - Campus Pride Index| TidyTutorial + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+ + + + +
+ +
+
+

Campus Pride Index| TidyTutorial

+
+
Data Viz
+
+
+ + + +
+ +
+
Author
+
+

Mitch Harrison

+
+
+ + + +
+ + + +
+ + +
+

Introduction

+

In celebration of Pride Month, this week’s TidyTuesday provides data from the Campus Pride Index, which measures the safety and inclusivity of LGBTQ+ programs across universities in the United States.

+

Each university is binned into one or more categories (e.g., military colleges, private/public, and others). What feels natural to me is to see how the Campus Pride Index compares across some of these categories. A proportionate stacked bar chart (where each bar has height equal to 1) is one option, but I would also like to see which types of universities are most common. If there are some categories with worse scores but with much smaller sample sizes, that would be helpful to know. So we’ll use a stacked bar, but not normalize the bar so we can also see how common each type is. Also bear in mind that a single university can (and often does) fall into multiple categories.

+

Let’s set some global settings so I don’t have to worry about aspect ratio or other trivialities while we work.

+
+
+Click here for code +
knitr::opts_chunk$set(
+  fig.width = 10,        
+  fig.asp = 0.618,      # the golden ratio
+  fig.align = "center"  # center align figures
+)
+
+
+
+
+

Data Wrangling

+

Time to load the data.

+
+
+Click here for code +
library(tidyverse)
+library(gglgbtq)
+library(ggchicklet)
+library(ggthemes)
+library(DT)
+
+# load the data ----------------------------------------------------------------
+
+pride_schools <- read_csv(paste0(
+  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/",
+  "2024/2024-06-11/pride_index.csv"
+))
+
+pride_tags <- read_csv(paste0(
+  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/",
+  "2024/2024-06-11/pride_index_tags.csv"
+))
+
+datatable(left_join(pride_schools, pride_tags))
+
+
+
+ +
+
+

First, let’s format the data for ease of plotting. Right now, each category has its own column, with TRUE or NA values, where NA means “false” for our purposes. But we want the type of school to be represented in a single column so we can map that column to the color of the bars. To move multiple columns into a single one, we will pivot the data. Since we want to consolidate columns, we will need to make our data longer (i.e., add more rows), where each university now has multiple rows corresponding to TRUE or FALSE. Intuitively, to pivot the data longer, we use the pivot_longer function. Notice that once the pivot is complete, we only want to keep the rows where the value is TRUE, since the FALSE rows are just saying that “this university doesn’t fall into this type,” which is useless noise in our dataset.

+
+
+Click here for code +
# format data for plotting -----------------------------------------------------
+
+uni_types <- pride_schools |>
+  
+  # join both datasets into one
+  left_join(pride_tags) |>
+  
+  # select which columns we want to analyze along with their ratings
+  select(rating, public, private, community, liberal_arts, technical,
+         religious, military, hbcu, hispanic_serving, aapi_serving,
+         other_minority_serving) |> 
+  
+  # replace NA with FALSE
+  mutate(across(everything(), ~ replace_na(., FALSE))) |>
+  
+  # do the pivot
+  pivot_longer(cols = !rating, names_to = "type") |>
+  
+  # drop the rows that don't apply
+  filter(value == TRUE) |>
+  
+  # clean up some strings for prettier plotting
+  mutate(
+    type = str_replace_all(type, "_", " "),
+    type = str_to_title(type),
+    type = str_replace(type, "Aapi", "AAPI")
+  )
+
+datatable(uni_types)
+
+
+
+ +
+
+

That looks just like we wanted it to. Now that our data is formatting, we can work on the plot. Per the data dictionary on the TidyVerse GitHub repository, we know that fractional scores are possible. A quick call to the unique function told me that the “fractional” scores are only half-stars, not any decimal in between two scores. So 1 and 1.5 are possible scores, but 1.7 is not. We should bin these scores by their leading digit so we have five possible fill values instead of ten. We’ll call these bins rating_levs, or “rating levels.”

+

I would also like to order the bars in descending order of the total number of universities of that type. To do that, we’ll count how many of each category there are and save the order as a vector uni_levs, or “university levels.”

+
+
+Click here for code +
uni_levs <- uni_types |>
+  group_by(type) |>
+  summarise(count = n()) |>
+  arrange(desc(count)) |>
+  pull(type)
+
+rating_levs <- c("1 - 1.5", "2 - 2.5", "3 - 3.5", "4 - 4.5", "5")
+
+
+

For our last data wrangling step, we can assign the rating bins to their respective ratings. I’ll create a new column for this and call it score. After that, we can group by type of university and score, and count the number of occurrences of each group. Then, we’ll be ready to plot.

+
+
+Click here for code +
uni_types <- uni_types |>
+  
+  # assign bins to the score variable
+  mutate(
+    score = case_when(
+      rating < 2 ~ rating_levs[1],
+      rating < 3 ~ rating_levs[2],
+      rating < 4 ~ rating_levs[3],
+      rating < 5 ~ rating_levs[4],
+      TRUE ~ rating_levs[5] 
+    ),
+    
+    # order the score bins using the rating_levs we made earlier
+    score = factor(score, levels = rating_levs)
+  ) |>
+  
+  # count the number of each type/score combination
+  group_by(type, score) |>
+  summarise(count = n(), .groups = "drop") |>
+  
+  # reorder the universities by descending order of number
+  mutate(type = factor(type, levels = rev(uni_levs)))
+
+
+
+
+

The Plot

+

Now that we have our data, let’s get the skeleton of the plot going. I’m going to use a favorite “cheat code” of mine for making aesthetically pleasing bar graphs in R: the ggchicklet package. It lets us round the corners of each bar, which gives a much more aesthetic appearance (in my opinion). So, instead of using the geom_col function that is standard, we will use geom_chicklet instead.

+

One small note for geom_chicklet: it prefers to have bar graphs be vertical. But because my category names are long (and you never want to rotate text), I would like the plot to be horizontal. So I’ll map the type to the x axis and the counts to the y axis like geom_chicklet prefers, but I’ll use coord_flip afterwards to make it horizontal. This is the same technique that the author of the ggchicklet package uses in his demo on the ggchicklet GitHub repository.

+
+
+Click here for code +
uni_types |>
+  ggplot(aes(x = type, y = count, fill = score)) +
+  geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) +
+  coord_flip()
+
+
+
+
+

+
+
+
+
+

This is already a great start! We have some aesthetic changes to make, but our bins and bars are in the order that we were hoping. Let’s change some colors.

+

First, I’ll use my favorite theme function, which comes from the ggthemes package. That theme is theme_fivethirtyeight, which takes its name from the legendary data visualizations of the FiveThirtyEight website.

+

I also think it would be appropriate for us to use Pride colors, don’t you? Of course, there is an R package for that: the gglgbtq package, which I imported earlier. We will use the “rainbow” color palette provided by gglgbtq to color our bars.

+
+
+Click here for code +
uni_types |>
+  ggplot(aes(x = type, y = count, fill = score)) +
+  geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) +
+  coord_flip() +
+  
+  # change theme and base font size
+  theme_fivethirtyeight() +
+  
+  # change bar colors and put the legend in the right order
+  scale_fill_manual(values = palette_lgbtq("rainbow"))
+
+
+
+
+

+
+
+
+
+

Now we’re cooking! I think we are safe to add the title and subtitle, and then we can make a few more aesthetic changes before wrapping up. I want the background to be black (personal preference), which means the text needs to be white. I also don’t think that horizontal grid lines are necessary when the y axis is discrete, so we will remove those. I love the legend, but I would like it to be stacked and placed vertically in the plot, rather than horizontal and below the plot.

+
+
+Click here for code +
uni_types |>
+  ggplot(aes(x = type, y = count, fill = score)) +
+  geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) +
+  coord_flip() +
+  theme_fivethirtyeight() +
+  scale_fill_manual(values = palette_lgbtq("rainbow")) +
+  
+  # add title and subtitle
+  labs(
+    title = "Campus Pride Index Scores",
+    subtitle = "Higher scores mean increased LGBTQ-inclusive policies/programs",
+  ) +
+  
+  theme(
+    # make all text white
+    text = element_text(color = "white", family = "Lato") ,
+    
+    # adjust title font size
+    plot.title = element_text(),
+    
+    # make background black
+    plot.background = element_rect(fill = "black"),
+    panel.background = element_rect(fill = "black"),
+    legend.background = element_rect(fill = "black"),
+    
+    # remove grid lines
+    panel.grid.major.y = element_blank(),
+    
+    # move legend
+    legend.direction = "vertical",
+    legend.position = c(0.9, 0.5),
+  )
+
+
+
+
+

+
+
+
+
+

Much better! Only a few small edits left. First, I don’t think the legend needs a title. I also want the higher scores to be higher on the legend, so we can reverse the order of the legend inside of the scale_fill_manual function. The y axis text is a little far from the axis for my liking, so we will shift that in, and we’ll be done, save for one more thing: fonts.

+

I’m going to use custom fonts that aren’t shipped with R or ggplot. These fonts come from Google Fonts, and we will need to use two packages to get them to work: sysfonts to load fonts from Google and showtext to get them to work with our plots. Once we import them, we can use them like any other font in our ggplot graphs!

+
+
+Click here for code +
# load fonts from Google Fonts into our project
+sysfonts::font_add_google(name = "Galada")
+sysfonts::font_add_google(name = "Lato")
+showtext::showtext_auto()
+
+uni_types |>
+  ggplot(aes(x = type, y = count, fill = score)) +
+  geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) +
+  coord_flip() +
+  theme_fivethirtyeight(base_size = 25) +
+  scale_fill_manual(
+    values = palette_lgbtq("rainbow"),
+    guide = guide_legend(reverse = TRUE) # reverse the legend order
+  ) +
+  labs(
+    title = "Campus Pride Index Scores",
+    subtitle = "Higher scores mean increased LGBTQ-inclusive policies/programs",
+  ) +
+  theme(
+    # use Lato font from Google for all text
+    text = element_text(color = "white", family = "Lato") ,
+    
+    # use Galada font from Google just for the title
+    plot.title = element_text(family = "Galada", size = 43),
+    
+    plot.background = element_rect(fill = "black"),
+    panel.background = element_rect(fill = "black"),
+    panel.grid.major.y = element_blank(),
+    legend.background = element_rect(fill = "black"),
+    legend.title = element_blank(),
+    legend.direction = "vertical",
+    legend.position = c(0.9, 0.5),
+    
+    # shift y axis text closer to the margin
+    axis.text.y = element_text(margin = margin(r = -20))
+  )
+
+
+
+
+

+
+
+
+
+
+
+

Conclusion

+

Done! With some data wrangling and some nice themes, we have arrived of a graph that we can be proud of (get it?). I hope this helps you in your own data viz journey, but if you have further questions, feel free to join my Discord server and ask me personally! And if you are feeling grateful for my work (and are financially able to), you can give me a special thanks by buying me a coffee.

+

As always, thanks for reading, and see you next week!

+ + +
+ +
+ +
+ + + + + + \ No newline at end of file diff --git a/docs/tutorials/tidytuesday_06112024_files/figure-html/plot-1-1.png b/docs/tutorials/tidytuesday_06112024_files/figure-html/plot-1-1.png new file mode 100644 index 0000000..f8fad0e Binary files /dev/null and b/docs/tutorials/tidytuesday_06112024_files/figure-html/plot-1-1.png differ diff --git a/docs/tutorials/tidytuesday_06112024_files/figure-html/plot-2-1.png b/docs/tutorials/tidytuesday_06112024_files/figure-html/plot-2-1.png new file mode 100644 index 0000000..d43b4c5 Binary files /dev/null and b/docs/tutorials/tidytuesday_06112024_files/figure-html/plot-2-1.png differ diff --git a/docs/tutorials/tidytuesday_06112024_files/figure-html/plot-3-1.png b/docs/tutorials/tidytuesday_06112024_files/figure-html/plot-3-1.png new file mode 100644 index 0000000..b5b8e67 Binary files /dev/null and b/docs/tutorials/tidytuesday_06112024_files/figure-html/plot-3-1.png differ diff --git a/docs/tutorials/tidytuesday_06112024_files/figure-html/plot-final-1.png b/docs/tutorials/tidytuesday_06112024_files/figure-html/plot-final-1.png new file mode 100644 index 0000000..7366a03 Binary files /dev/null and b/docs/tutorials/tidytuesday_06112024_files/figure-html/plot-final-1.png differ diff --git a/images/thumbnails/projects/tidytuesday/06112024.png b/images/thumbnails/projects/tidytuesday/06112024.png deleted file mode 100644 index 9f051e0..0000000 Binary files a/images/thumbnails/projects/tidytuesday/06112024.png and /dev/null differ diff --git a/projects/tidytuesday_06112024/pride.qmd b/projects/tidytuesday_06112024/pride.qmd index e01d240..91de009 100644 --- a/projects/tidytuesday_06112024/pride.qmd +++ b/projects/tidytuesday_06112024/pride.qmd @@ -17,7 +17,7 @@ we'll build a stacked horizontal bar chart to see the distribution of scores for some of those categories. I'll use the `ggchicklet` package and some custom fonts for easy aesthetic changes, and we'll be done! If you want to see a step-by-step tutorial explaining the code, click -[here](). +[here](../../tutorials/tidytuesday_06112024.qmd). ```{r} @@ -122,7 +122,11 @@ uni_types |> #| echo: false # save the plot (for thumbnails/posting) -# ggsave("images/thumbnails/projects/tidytuesday/06112024.png") +#ggsave( +# "images/thumbnails/projects/tidytuesday/06112024.png", +# width = 9, +# height = 5.5 +#) ``` # Conclusion diff --git a/tutorials/tidytuesday_05212024.qmd b/tutorials/tidytuesday_05212024.qmd index 353726b..cc764ac 100644 --- a/tutorials/tidytuesday_05212024.qmd +++ b/tutorials/tidytuesday_05212024.qmd @@ -8,8 +8,8 @@ image: "../../images/thumbnails/tidytuesday/05212024.png" Welcome! If you saw my post for this week's TidyTuesday, I'm glad you liked it enough to learn from it! If not, you can either scroll to the bottom to see the -final product or click [here](../projects/tidytuesday_05212024/emissions.qmd) -to see it. For this plot, we will use an area plot to visualize the global +final product or click [here](../projects/tidytuesday_05212024/emissions.qmd). +For this plot, we will use an area plot to visualize the global emissions by type going back to 1900. To start, we will use a bare-bones `ggplot2` area chart with no bells or whistles to see what we are working with. diff --git a/tutorials/tidytuesday_06112024.qmd b/tutorials/tidytuesday_06112024.qmd new file mode 100644 index 0000000..80b1828 --- /dev/null +++ b/tutorials/tidytuesday_06112024.qmd @@ -0,0 +1,332 @@ +--- +title: "Campus Pride Index| TidyTutorial" +author: "Mitch Harrison" +categories: + - "Data Viz" +image: "../../images/thumbnails/tidytuesday/06112024.jpg" +--- + +# Introduction + +In celebration of Pride Month, this week's +[TidyTuesday](https://github.com/rfordatascience/tidytuesday/tree/master/data/2024/2024-06-11) +provides data from the [Campus Pride Index](https://www.campusprideindex.org), +which measures the safety and inclusivity of LGBTQ+ programs across universities +in the United States. + +Each university is binned into one or more categories (e.g., military colleges, +private/public, and others). What feels natural to me is to see how the Campus +Pride Index compares across some of these categories. A proportionate stacked +bar chart (where each bar has height equal to 1) is one option, but I would +also like to see which types of universities are most common. If there are some +categories with worse scores but with much smaller sample sizes, that would be +helpful to know. So we'll use a stacked bar, but not normalize the bar so we +can also see how common each type is. Also bear in mind that a single university +can (and often does) fall into multiple categories. + +Let's set some global settings so I don't have to worry about aspect ratio or +other trivialities while we work. + +```{r} +#| label: set-theme + +knitr::opts_chunk$set( + fig.width = 10, + fig.asp = 0.618, # the golden ratio + fig.align = "center" # center align figures +) +``` + +# Data Wrangling + +Time to load the data. + +```{r} +#| label: load-libs-and-data + +library(tidyverse) +library(gglgbtq) +library(ggchicklet) +library(ggthemes) +library(DT) + +# load the data ---------------------------------------------------------------- + +pride_schools <- read_csv(paste0( + "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/", + "2024/2024-06-11/pride_index.csv" +)) + +pride_tags <- read_csv(paste0( + "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/", + "2024/2024-06-11/pride_index_tags.csv" +)) + +datatable(left_join(pride_schools, pride_tags)) +``` + +First, let's format the data for ease of plotting. Right now, each category has +its own column, with `TRUE` or `NA` values, where `NA` means "false" for our +purposes. But we want the type of school to be represented in a single column so +we can map that column to the color of the bars. To move multiple columns into a +single one, we will **pivot** the data. Since we want to consolidate columns, we +will need to make our data *longer* (i.e., add more rows), where each university +now has multiple rows corresponding to `TRUE` or `FALSE`. Intuitively, to pivot +the data longer, we use the `pivot_longer` function. Notice that once the pivot +is complete, we only want to keep the rows where the value is `TRUE`, since the +`FALSE` rows are just saying that "this university doesn't fall into this type," +which is useless noise in our dataset. + +```{r} +#| label: format-data + +# format data for plotting ----------------------------------------------------- + +uni_types <- pride_schools |> + + # join both datasets into one + left_join(pride_tags) |> + + # select which columns we want to analyze along with their ratings + select(rating, public, private, community, liberal_arts, technical, + religious, military, hbcu, hispanic_serving, aapi_serving, + other_minority_serving) |> + + # replace NA with FALSE + mutate(across(everything(), ~ replace_na(., FALSE))) |> + + # do the pivot + pivot_longer(cols = !rating, names_to = "type") |> + + # drop the rows that don't apply + filter(value == TRUE) |> + + # clean up some strings for prettier plotting + mutate( + type = str_replace_all(type, "_", " "), + type = str_to_title(type), + type = str_replace(type, "Aapi", "AAPI") + ) + +datatable(uni_types) +``` + +That looks just like we wanted it to. Now that our data is formatting, we can +work on the plot. Per the data dictionary on the TidyVerse GitHub repository, we +know that fractional scores are possible. A quick call to the `unique` function +told me that the "fractional" scores are only half-stars, not any decimal in +between two scores. So 1 and 1.5 are possible scores, but 1.7 is not. We should +bin these scores by their leading digit so we have five possible fill values +instead of ten. We'll call these bins `rating_levs`, or "rating levels." + +I would also like to order the bars in descending order of the total number of +universities of that type. To do that, we'll count how many of each category +there are and save the order as a vector `uni_levs`, or "university levels." + +```{r} +#| label: factor-levels + +uni_levs <- uni_types |> + group_by(type) |> + summarise(count = n()) |> + arrange(desc(count)) |> + pull(type) + +rating_levs <- c("1 - 1.5", "2 - 2.5", "3 - 3.5", "4 - 4.5", "5") +``` + +For our last data wrangling step, we can assign the rating bins to their +respective ratings. I'll create a new column for this and call it `score`. After +that, we can group by type of university and `score`, and count the number of +occurrences of each group. Then, we'll be ready to plot. + +```{r} +#| label: add_score + +uni_types <- uni_types |> + + # assign bins to the score variable + mutate( + score = case_when( + rating < 2 ~ rating_levs[1], + rating < 3 ~ rating_levs[2], + rating < 4 ~ rating_levs[3], + rating < 5 ~ rating_levs[4], + TRUE ~ rating_levs[5] + ), + + # order the score bins using the rating_levs we made earlier + score = factor(score, levels = rating_levs) + ) |> + + # count the number of each type/score combination + group_by(type, score) |> + summarise(count = n(), .groups = "drop") |> + + # reorder the universities by descending order of number + mutate(type = factor(type, levels = rev(uni_levs))) +``` + +# The Plot + +Now that we have our data, let's get the skeleton of the plot going. I'm going +to use a favorite "cheat code" of mine for making aesthetically pleasing bar +graphs in R: the `ggchicklet` package. It lets us round the corners of each bar, +which gives a much more aesthetic appearance (in my opinion). So, instead of +using the `geom_col` function that is standard, we will use `geom_chicklet` +instead. + +One small note for `geom_chicklet`: it prefers to have bar graphs be vertical. +But because my category names are long (and you *never* want to rotate text), +I would like the plot to be horizontal. So I'll map the type to the `x` axis and +the counts to the `y` axis like `geom_chicklet` prefers, but I'll use +`coord_flip` afterwards to make it horizontal. This is the same technique that +the author of the `ggchicklet` package uses in his demo on the +[`ggchicklet` GitHub repository](https://github.com/hrbrmstr/ggchicklet). + +```{r} +#| label: plot-1 + +uni_types |> + ggplot(aes(x = type, y = count, fill = score)) + + geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) + + coord_flip() +``` + +This is already a great start! We have some aesthetic changes to make, but our +bins and bars are in the order that we were hoping. Let's change some colors. + +First, I'll use my favorite theme function, which comes from the `ggthemes` +package. That theme is `theme_fivethirtyeight`, which takes its name from the +legendary data visualizations of the +[FiveThirtyEight](https://abcnews.go.com/538) website. + +I also think it would be +appropriate for us to use Pride colors, don't you? Of course, there is an R +package for that: the `gglgbtq` package, which I imported earlier. We will use +the "rainbow" color palette provided by `gglgbtq` to color our bars. + +```{r} +#| label: plot-2 + +uni_types |> + ggplot(aes(x = type, y = count, fill = score)) + + geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) + + coord_flip() + + + # change theme and base font size + theme_fivethirtyeight() + + + # change bar colors and put the legend in the right order + scale_fill_manual(values = palette_lgbtq("rainbow")) +``` + +Now we're cooking! I think we are safe to add the title and subtitle, and then +we can make a few more aesthetic changes before wrapping up. I want the +background to be black (personal preference), which means the text needs to be +white. I also don't think that horizontal grid lines are necessary when the `y` +axis is discrete, so we will remove those. I love the legend, but I would like +it to be stacked and placed vertically in the plot, rather than horizontal and +below the plot. + +```{r} +#| label: plot-3 + +uni_types |> + ggplot(aes(x = type, y = count, fill = score)) + + geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) + + coord_flip() + + theme_fivethirtyeight() + + scale_fill_manual(values = palette_lgbtq("rainbow")) + + + # add title and subtitle + labs( + title = "Campus Pride Index Scores", + subtitle = "Higher scores mean increased LGBTQ-inclusive policies/programs", + ) + + + theme( + # make all text white + text = element_text(color = "white", family = "Lato") , + + # adjust title font size + plot.title = element_text(), + + # make background black + plot.background = element_rect(fill = "black"), + panel.background = element_rect(fill = "black"), + legend.background = element_rect(fill = "black"), + + # remove grid lines + panel.grid.major.y = element_blank(), + + # move legend + legend.direction = "vertical", + legend.position = c(0.9, 0.5), + ) +``` + +Much better! Only a few small edits left. First, I don't think the legend needs +a title. I also want the higher scores to be higher on the legend, so we can +reverse the order of the legend inside of the `scale_fill_manual` function. The +`y` axis text is a little far from the axis for my liking, so we will shift that +in, and we'll be done, save for one more thing: fonts. + +I'm going to use custom fonts that aren't shipped with R or `ggplot`. These +fonts come from [Google Fonts](https://fonts.google.com), and we will need to +use two packages to get them to work: `sysfonts` to load fonts from Google and +`showtext` to get them to work with our plots. Once we import them, we can use +them like any other font in our `ggplot` graphs! + +```{r} +#| label: plot-final + +# load fonts from Google Fonts into our project +sysfonts::font_add_google(name = "Galada") +sysfonts::font_add_google(name = "Lato") +showtext::showtext_auto() + +uni_types |> + ggplot(aes(x = type, y = count, fill = score)) + + geom_chicklet(position = position_stack(reverse = TRUE), width = 0.6) + + coord_flip() + + theme_fivethirtyeight(base_size = 25) + + scale_fill_manual( + values = palette_lgbtq("rainbow"), + guide = guide_legend(reverse = TRUE) # reverse the legend order + ) + + labs( + title = "Campus Pride Index Scores", + subtitle = "Higher scores mean increased LGBTQ-inclusive policies/programs", + ) + + theme( + # use Lato font from Google for all text + text = element_text(color = "white", family = "Lato") , + + # use Galada font from Google just for the title + plot.title = element_text(family = "Galada", size = 43), + + plot.background = element_rect(fill = "black"), + panel.background = element_rect(fill = "black"), + panel.grid.major.y = element_blank(), + legend.background = element_rect(fill = "black"), + legend.title = element_blank(), + legend.direction = "vertical", + legend.position = c(0.9, 0.5), + + # shift y axis text closer to the margin + axis.text.y = element_text(margin = margin(r = -20)) + ) +``` + +# Conclusion + +Done! With some data wrangling and some nice themes, we have arrived of a +graph that we can be *proud* of (get it?). I hope this helps you in your own +data viz journey, but if you have further questions, feel free to join my +[Discord server](https://discord.gg/vF6W2bdKFH) and ask me personally! And if +you are feeling grateful for my work (and are financially able to), you +can give me a special thanks by +[buying me a coffee](https://buymeacoffee.com/mitchellharrison). + +As always, thanks for reading, and see you next week! \ No newline at end of file