Skip to content

Commit

Permalink
fix: write rename columns post and fix quarto chunks arguments
Browse files Browse the repository at this point in the history
  • Loading branch information
Layalchristine24 committed Oct 8, 2023
1 parent 50dc9cd commit c143654
Show file tree
Hide file tree
Showing 7 changed files with 137 additions and 57 deletions.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"hash": "6caedeee311271dccef76778ccccac07",
"hash": "0336b99906c453f098aa7254d0eb64d1",
"result": {
"markdown": "---\ntitle: \"Detect date and time variables with openxlsx\"\nauthor: \"Layal C. Lettry\"\ndate: \"2023-10-08\"\ncategories: [code, openxlsx, date, datetime]\nimage: \"image.jpg\"\n---\n\n\n# Detect date variables\n\nWhen you try to read an excel file, the dates don't always look the way you would expect. You may see a vector of integers (or doubles) rather than a vector of dates. If you are using [openxlsx](https://github.com/ycphs/openxlsx), you can set `detectDates = TRUE` in the function `read.xlsx()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(openxlsx)\nlibrary(tidyverse)\nlibrary(readxl)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nxlsxfile_path <- system.file(\"extdata\", \"readTest.xlsx\", package = \"openxlsx\")\n\n# Vector of doubles instead of dates\nxlsxfile_with_problems <- read.xlsx(xlsxfile_path, sheet = 3) |> \n as_tibble()\nxlsxfile_with_problems\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 2,083 × 5\n Date value word bool wordZ2 \n <dbl> <dbl> <chr> <lgl> <chr> \n 1 41757 0.839 N-U-B-R-A FALSE FALSE-Z\n 2 41756 0.886 N-Z-P-S-Y TRUE TRUE-Z \n 3 41755 0.574 C-G-D-X-H TRUE TRUE-Z \n 4 41754 0.137 <NA> FALSE FALSE-Z\n 5 41753 0.369 B-K-A-O-W TRUE TRUE-Z \n 6 41752 NA H-P-G-O-K TRUE TRUE-Z \n 7 41751 0.842 F-P-C-L-T TRUE TRUE-Z \n 8 41750 0.227 A-N-Q-P-V TRUE TRUE-Z \n 9 41749 0.276 Y-E-B-K-O TRUE TRUE-Z \n10 41748 0.419 V-S-N-T-R TRUE TRUE-Z \n# ℹ 2,073 more rows\n```\n:::\n\n```{.r .cell-code}\nglimpse(xlsxfile_with_problems)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nRows: 2,083\nColumns: 5\n$ Date <dbl> 41757, 41756, 41755, 41754, 41753, 41752, 41751, 41750, 41749, …\n$ value <dbl> 0.839076400, 0.886380000, 0.574131400, 0.136606500, 0.369258200…\n$ word <chr> \"N-U-B-R-A\", \"N-Z-P-S-Y\", \"C-G-D-X-H\", NA, \"B-K-A-O-W\", \"H-P-G-…\n$ bool <lgl> FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…\n$ wordZ2 <chr> \"FALSE-Z\", \"TRUE-Z\", \"TRUE-Z\", \"FALSE-Z\", \"TRUE-Z\", \"TRUE-Z\", \"…\n```\n:::\n\n```{.r .cell-code}\n# Vector of dates\nxlsxfile <- read.xlsx(xlsxfile_path, sheet = 3, detectDates = TRUE) |> \n as_tibble()\nxlsxfile\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 2,083 × 5\n Date value word bool wordZ2 \n <date> <dbl> <chr> <lgl> <chr> \n 1 2014-04-28 0.839 N-U-B-R-A FALSE FALSE-Z\n 2 2014-04-27 0.886 N-Z-P-S-Y TRUE TRUE-Z \n 3 2014-04-26 0.574 C-G-D-X-H TRUE TRUE-Z \n 4 2014-04-25 0.137 <NA> FALSE FALSE-Z\n 5 2014-04-24 0.369 B-K-A-O-W TRUE TRUE-Z \n 6 2014-04-23 NA H-P-G-O-K TRUE TRUE-Z \n 7 2014-04-22 0.842 F-P-C-L-T TRUE TRUE-Z \n 8 2014-04-21 0.227 A-N-Q-P-V TRUE TRUE-Z \n 9 2014-04-20 0.276 Y-E-B-K-O TRUE TRUE-Z \n10 2014-04-19 0.419 V-S-N-T-R TRUE TRUE-Z \n# ℹ 2,073 more rows\n```\n:::\n\n```{.r .cell-code}\nglimpse(xlsxfile)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nRows: 2,083\nColumns: 5\n$ Date <date> 2014-04-28, 2014-04-27, 2014-04-26, 2014-04-25, 2014-04-24, 20…\n$ value <dbl> 0.839076400, 0.886380000, 0.574131400, 0.136606500, 0.369258200…\n$ word <chr> \"N-U-B-R-A\", \"N-Z-P-S-Y\", \"C-G-D-X-H\", NA, \"B-K-A-O-W\", \"H-P-G-…\n$ bool <lgl> FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…\n$ wordZ2 <chr> \"FALSE-Z\", \"TRUE-Z\", \"TRUE-Z\", \"FALSE-Z\", \"TRUE-Z\", \"TRUE-Z\", \"…\n```\n:::\n:::\n\n\n# Convert double variables to date and time variables\n\nAnother way to convert a vector of integers is to use the function `convertToDate()` or `convertToDateTime()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nother_file <- readxl_example(path = \"type-me.xlsx\")\nxlsxfile_datetime <- read.xlsx(other_file, sheet = 3) |> \n as_tibble() |> \n slice(2:3) |> \n select(`maybe.a.datetime?`) |> \n pull()\nxlsxfile_datetime\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"41051\" \"41026.479166666664\"\n```\n:::\n\n```{.r .cell-code}\nconvertToDate(xlsxfile_datetime[1])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"2012-05-22\"\n```\n:::\n\n```{.r .cell-code}\nconvertToDateTime(xlsxfile_datetime[2])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"2012-04-27 11:30:00 CEST\"\n```\n:::\n:::\n\n\n\n# Links\nThese examples are inspired by:\n- [https://rdrr.io/cran/openxlsxhttps://rdrr.io/cran/openxlsx](https://rdrr.io/cran/openxlsx/man/read.xlsx.html)\n\n- [https://readxl.tidyverse.org](https://readxl.tidyverse.org)\n",
"markdown": "---\ntitle: \"Detect date and time variables with openxlsx\"\nauthor: \"Layal C. Lettry\"\ndate: \"2023-10-08\"\ncategories: [openxlsx, date, datetime]\nimage: \"image.jpg\"\n---\n\n\n# Detect date variables\n\nWhen you try to read an excel file, the dates don't always look the way you would expect. You may see a vector of integers (or doubles) rather than a vector of dates. If you are using [openxlsx](https://github.com/ycphs/openxlsx), you can set `detectDates = TRUE` in the function `read.xlsx()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(openxlsx)\nlibrary(tidyverse)\nlibrary(readxl)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nxlsxfile_path <- system.file(\"extdata\", \"readTest.xlsx\", package = \"openxlsx\")\n\n# Vector of doubles instead of dates\nxlsxfile_with_problems <- read.xlsx(xlsxfile_path, sheet = 3) |> \n as_tibble()\nxlsxfile_with_problems\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 2,083 × 5\n Date value word bool wordZ2 \n <dbl> <dbl> <chr> <lgl> <chr> \n 1 41757 0.839 N-U-B-R-A FALSE FALSE-Z\n 2 41756 0.886 N-Z-P-S-Y TRUE TRUE-Z \n 3 41755 0.574 C-G-D-X-H TRUE TRUE-Z \n 4 41754 0.137 <NA> FALSE FALSE-Z\n 5 41753 0.369 B-K-A-O-W TRUE TRUE-Z \n 6 41752 NA H-P-G-O-K TRUE TRUE-Z \n 7 41751 0.842 F-P-C-L-T TRUE TRUE-Z \n 8 41750 0.227 A-N-Q-P-V TRUE TRUE-Z \n 9 41749 0.276 Y-E-B-K-O TRUE TRUE-Z \n10 41748 0.419 V-S-N-T-R TRUE TRUE-Z \n# ℹ 2,073 more rows\n```\n:::\n\n```{.r .cell-code}\nglimpse(xlsxfile_with_problems)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nRows: 2,083\nColumns: 5\n$ Date <dbl> 41757, 41756, 41755, 41754, 41753, 41752, 41751, 41750, 41749, …\n$ value <dbl> 0.839076400, 0.886380000, 0.574131400, 0.136606500, 0.369258200…\n$ word <chr> \"N-U-B-R-A\", \"N-Z-P-S-Y\", \"C-G-D-X-H\", NA, \"B-K-A-O-W\", \"H-P-G-…\n$ bool <lgl> FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…\n$ wordZ2 <chr> \"FALSE-Z\", \"TRUE-Z\", \"TRUE-Z\", \"FALSE-Z\", \"TRUE-Z\", \"TRUE-Z\", \"…\n```\n:::\n\n```{.r .cell-code}\n# Vector of dates\nxlsxfile <- read.xlsx(xlsxfile_path, sheet = 3, detectDates = TRUE) |> \n as_tibble()\nxlsxfile\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 2,083 × 5\n Date value word bool wordZ2 \n <date> <dbl> <chr> <lgl> <chr> \n 1 2014-04-28 0.839 N-U-B-R-A FALSE FALSE-Z\n 2 2014-04-27 0.886 N-Z-P-S-Y TRUE TRUE-Z \n 3 2014-04-26 0.574 C-G-D-X-H TRUE TRUE-Z \n 4 2014-04-25 0.137 <NA> FALSE FALSE-Z\n 5 2014-04-24 0.369 B-K-A-O-W TRUE TRUE-Z \n 6 2014-04-23 NA H-P-G-O-K TRUE TRUE-Z \n 7 2014-04-22 0.842 F-P-C-L-T TRUE TRUE-Z \n 8 2014-04-21 0.227 A-N-Q-P-V TRUE TRUE-Z \n 9 2014-04-20 0.276 Y-E-B-K-O TRUE TRUE-Z \n10 2014-04-19 0.419 V-S-N-T-R TRUE TRUE-Z \n# ℹ 2,073 more rows\n```\n:::\n\n```{.r .cell-code}\nglimpse(xlsxfile)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nRows: 2,083\nColumns: 5\n$ Date <date> 2014-04-28, 2014-04-27, 2014-04-26, 2014-04-25, 2014-04-24, 20…\n$ value <dbl> 0.839076400, 0.886380000, 0.574131400, 0.136606500, 0.369258200…\n$ word <chr> \"N-U-B-R-A\", \"N-Z-P-S-Y\", \"C-G-D-X-H\", NA, \"B-K-A-O-W\", \"H-P-G-…\n$ bool <lgl> FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…\n$ wordZ2 <chr> \"FALSE-Z\", \"TRUE-Z\", \"TRUE-Z\", \"FALSE-Z\", \"TRUE-Z\", \"TRUE-Z\", \"…\n```\n:::\n:::\n\n\n# Convert double variables to date and time variables\n\nAnother way to convert a vector of integers is to use the function `convertToDate()` or `convertToDateTime()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nother_file <- readxl_example(path = \"type-me.xlsx\")\nxlsxfile_datetime <- read.xlsx(other_file, sheet = 3) |> \n as_tibble() |> \n slice(2:3) |> \n select(`maybe.a.datetime?`) |> \n pull()\nxlsxfile_datetime\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"41051\" \"41026.479166666664\"\n```\n:::\n\n```{.r .cell-code}\nconvertToDate(xlsxfile_datetime[1])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"2012-05-22\"\n```\n:::\n\n```{.r .cell-code}\nconvertToDateTime(xlsxfile_datetime[2])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"2012-04-27 11:30:00 CEST\"\n```\n:::\n:::\n\n\n\n# Links\nThese examples are inspired by:\n\n- [https://rdrr.io/cran/openxlsxhttps://rdrr.io/cran/openxlsx](https://rdrr.io/cran/openxlsx/man/read.xlsx.html)\n\n- [https://readxl.tidyverse.org](https://readxl.tidyverse.org)\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"hash": "2c1b7cd21ed9922aade253ecf8ae1982",
"result": {
"markdown": "---\ntitle: \"Rename variables in a data frame using an external lookup table\"\nauthor: \"Layal C. Lettry\"\ndate: \"2023-10-08\"\ncategories: [unquote-splice, tidy evaluation, rename, any_of]\nimage: \"image.jpg\"\n---\n\n\n# Rename variables in a data frame using an external lookup table\n\nSuppose that a data frame is present with certain columns that possess the appropriate names, however, the remaining columns require renaming. An existing lookup table is ready to be used for setting new names to these specific columns. \n\nI found the solution by using tidy evaluation tools, namely the unquote-splice `!!!`, and by reading the [article written by Tim Tiefenbach](https://tim-tiefenbach.de/post/2022-rename-columns/#dplyr-tidyverse). \n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\n```\n:::\n\n\nHere is the data frame with 3 variables, namely `var1`, `var2` and `var4`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntest_tib <- tribble(\n ~var1, ~var2, ~var4,\n \"x\", \"a\", 1L,\n \"y\", \"b\", 2L,\n \"z\", \"c\", 3L\n)\n```\n:::\n\n\nDefine the lookup table with the new names. Transform this lookup table into a named vector using `deframe()`. Do not forget that the first argument of `deframe()` should be the new names of the variable and the second one should have the actual names.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnew_names <- tribble(\n ~names_var, ~new_names_var,\n \"var1\", \"Variable 1\",\n \"var2\", \"Variable 2\",\n \"var3\", \"Variable 3\",\n \"var4\", \"Variable 4\"\n)\n\nnew_names_vec <- deframe(select(new_names, new_names_var, names_var))\n```\n:::\n\n\n# Solution using tidy evaluation and base R\n\nOur goal is to unpack the vector of column name pairs that are actually in our data frame. We could achieve this by using unquote-splice `!!!` which will splice the list of names into the dynamic dots `...` of `rename()`.\n\nHowever, the column `var3` is not found. An error appears.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntest_tib |>\n rename(!!!new_names_vec)\n```\n\n::: {.cell-output .cell-output-error}\n```\nError in `rename()`:\n! Can't rename columns that don't exist.\n✖ Column `var3` doesn't exist.\n```\n:::\n:::\n\n\nSelect only the variables which are in the named vector `new_names_vec`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntest_tib |>\n rename(!!!new_names_vec[new_names_vec %in% names(test_tib)])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 3 × 3\n `Variable 1` `Variable 2` `Variable 4`\n <chr> <chr> <int>\n1 x a 1\n2 y b 2\n3 z c 3\n```\n:::\n:::\n\n\n# Solution using dplyr\n\nInstead of selecting the common variables, you can use `any_of()` which does this selection automatically.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntest_tib |>\n rename(any_of(new_names_vec))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 3 × 3\n `Variable 1` `Variable 2` `Variable 4`\n <chr> <chr> <int>\n1 x a 1\n2 y b 2\n3 z c 3\n```\n:::\n:::\n\n\n\n# Sources\n\nThese examples are inspired by:\n\n- [Article written by Tim Tiefenbach](https://tim-tiefenbach.de/post/2022-rename-columns/#dplyr-tidyverse)\n\n- [https://dcl-prog.stanford.edu/tidy-eval-detailed.html](https://dcl-prog.stanford.edu/tidy-eval-detailed.html)\n\n- [https://adv-r.hadley.nz/quasiquotation.html#unquoting-many-arguments](https://adv-r.hadley.nz/quasiquotation.html#unquoting-many-arguments)\n\n- [https://rlang.r-lib.org/reference/topic-inject.html#splicing-with--1](https://rlang.r-lib.org/reference/topic-inject.html#splicing-with--1)\n\n- [https://rlang.r-lib.org/reference/dyn-dots.html](https://rlang.r-lib.org/reference/dyn-dots.html)\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
20 changes: 15 additions & 5 deletions posts/2023-10-08_datetimes-openxlsx/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,27 @@
title: "Detect date and time variables with openxlsx"
author: "Layal C. Lettry"
date: "2023-10-08"
categories: [code, openxlsx, date, datetime]
categories: [openxlsx, date, datetime]
image: "image.jpg"
---

# Detect date variables

When you try to read an excel file, the dates don't always look the way you would expect. You may see a vector of integers (or doubles) rather than a vector of dates. If you are using [openxlsx](https://github.com/ycphs/openxlsx), you can set `detectDates = TRUE` in the function `read.xlsx()`.

```{r load_libraries, eval=TRUE, message=FALSE, warning=FALSE}
```{r}
#| label: load_libraries
#| message: false
#| warning: false
library(openxlsx)
library(tidyverse)
library(readxl)
```

```{r detectdates, eval=TRUE, message=FALSE, warning=FALSE}
```{r}
#| label: detectdates
#| message: false
#| warning: false
xlsxfile_path <- system.file("extdata", "readTest.xlsx", package = "openxlsx")
# Vector of doubles instead of dates
Expand All @@ -36,7 +42,10 @@ glimpse(xlsxfile)

Another way to convert a vector of integers is to use the function `convertToDate()` or `convertToDateTime()`.

```{r convertodate, eval=TRUE, message=FALSE, warning=FALSE}
```{r}
#| label: convertodate
#| message: false
#| warning: false
other_file <- readxl_example(path = "type-me.xlsx")
xlsxfile_datetime <- read.xlsx(other_file, sheet = 3) |>
as_tibble() |>
Expand All @@ -50,7 +59,8 @@ convertToDateTime(xlsxfile_datetime[2])
```


# Links
# Sources

These examples are inspired by:

- [https://rdrr.io/cran/openxlsxhttps://rdrr.io/cran/openxlsx](https://rdrr.io/cran/openxlsx/man/read.xlsx.html)
Expand Down
Binary file added posts/2023-10-08_rename-columns-lookup/image.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
106 changes: 106 additions & 0 deletions posts/2023-10-08_rename-columns-lookup/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
title: "Rename variables in a data frame using an external lookup table"
author: "Layal C. Lettry"
date: "2023-10-08"
categories: [unquote-splice, tidy evaluation, rename, any_of]
image: "image.jpg"
---

# Rename variables in a data frame using an external lookup table

Suppose that a data frame is present with certain columns that possess the appropriate names, however, the remaining columns require renaming. An existing lookup table is ready to be used for setting new names to these specific columns.

I found the solution by using tidy evaluation tools, namely the unquote-splice `!!!`, and by reading the [article written by Tim Tiefenbach](https://tim-tiefenbach.de/post/2022-rename-columns/#dplyr-tidyverse).

```{r}
#| label: load_libraries
#| message: false
#| warning: false
library(tidyverse)
```

Here is the data frame with 3 variables, namely `var1`, `var2` and `var4`.

```{r}
#| label: data
#| message: false
#| warning: false
test_tib <- tribble(
~var1, ~var2, ~var4,
"x", "a", 1L,
"y", "b", 2L,
"z", "c", 3L
)
```

Define the lookup table with the new names. Transform this lookup table into a named vector using `deframe()`. Do not forget that the first argument of `deframe()` should be the new names of the variable and the second one should have the actual names.

```{r}
#| label: lookup
#| message: false
#| warning: false
new_names <- tribble(
~names_var, ~new_names_var,
"var1", "Variable 1",
"var2", "Variable 2",
"var3", "Variable 3",
"var4", "Variable 4"
)
new_names_vec <- deframe(select(new_names, new_names_var, names_var))
```

# Solution using tidy evaluation and base R

Our goal is to unpack the vector of column name pairs that are actually in our data frame. We could achieve this by using unquote-splice `!!!` which will splice the list of names into the dynamic dots `...` of `rename()`.

However, the column `var3` is not found. An error appears.

```{r}
#| label: error
#| error: true
#| message: false
#| warning: false
test_tib |>
rename(!!!new_names_vec)
```

Select only the variables which are in the named vector `new_names_vec`.

```{r}
#| label: base_r_solution
#| message: false
#| warning: false
test_tib |>
rename(!!!new_names_vec[new_names_vec %in% names(test_tib)])
```

# Solution using dplyr

Instead of selecting the common variables, you can use `any_of()` which does this selection automatically.

```{r}
#| label: dplyr_solution
#| message: false
#| warning: false
test_tib |>
rename(any_of(new_names_vec))
```


# Sources

These examples are inspired by:

- [Article written by Tim Tiefenbach](https://tim-tiefenbach.de/post/2022-rename-columns/#dplyr-tidyverse)

- [https://dcl-prog.stanford.edu/tidy-eval-detailed.html](https://dcl-prog.stanford.edu/tidy-eval-detailed.html)

- [https://adv-r.hadley.nz/quasiquotation.html#unquoting-many-arguments](https://adv-r.hadley.nz/quasiquotation.html#unquoting-many-arguments)

- [https://rlang.r-lib.org/reference/topic-inject.html#splicing-with--1](https://rlang.r-lib.org/reference/topic-inject.html#splicing-with--1)

- [https://rlang.r-lib.org/reference/dyn-dots.html](https://rlang.r-lib.org/reference/dyn-dots.html)
Binary file removed posts/test/image.jpg
Binary file not shown.
50 changes: 0 additions & 50 deletions posts/test/index.qmd

This file was deleted.

0 comments on commit c143654

Please sign in to comment.