Skip to content

Commit

Permalink
Merge pull request #5 from Layalchristine24/f-nest-pack
Browse files Browse the repository at this point in the history
Write post about nest pack
  • Loading branch information
Layalchristine24 authored May 30, 2024
2 parents b876969 + 41828eb commit 8eefc21
Show file tree
Hide file tree
Showing 3 changed files with 144 additions and 0 deletions.
15 changes: 15 additions & 0 deletions _freeze/posts/2024-05-30_pack-nest/index/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"hash": "6d6be9bd60dc0bed592932a4063b2569",
"result": {
"engine": "knitr",
"markdown": "---\ntitle: What is the difference between (un)packing and (un)nesting a tibble?\nauthor:\n - name:\n given: Layal Christine\n family: Lettry\n orcid: 0009-0008-6396-0523\n affiliations:\n - id: cynkra\n - name: cynkra GmbH\n city: Zurich\n state: CH\n - id: unifr\n - name: University of Fribourg, Dept. of Informatics, ASAM Group\n city: Fribourg\n state: CH\ndate: 2024-05-30\ncategories: [nest, unnest, pack, unpack, tidyr, constructive]\nimage: image.jpg\ncitation: \n url: https://rdiscovery.netlify.app/posts/2024-05-30_pack-nest/\nformat:\n html:\n toc: true\n toc-depth: 6\n toc-title: Contents\n toc-location: right\n number-sections: false\neditor_options: \n chunk_output_type: console\n---\n\n\n*Does a nested tibble have the same structure as a packed tibble?*\n\n# Initial object\n\nLet's assume that we have the object `my_tib` which is a nested tibble containing a list, namely `my_values`, with another tibble where the variables are `my_ints` and `my_chars`. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_tib <-\n tibble::tibble(\n my_values = list(tibble::tibble(\n my_ints = 1L:5L,\n my_chars = LETTERS[my_ints]\n ))\n )\nconstructive::construct(my_tib)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\ntibble::tibble(\n my_values = list(tibble::tibble(my_ints = 1:5, my_chars = c(\"A\", \"B\", \"C\", \"D\", \"E\"))),\n)\n```\n\n\n:::\n:::\n\n\nWe could also use `tidyr::nest()` to create `my_tib` (please refer to [this article](https://tidyr.tidyverse.org/articles/nest.html) for more info).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_nested_tib <-\n tibble::tribble(\n ~my_ints, ~my_chars,\n 1L, \"A\",\n 2L, \"B\",\n 3L, \"C\",\n 4L, \"D\",\n 5L, \"E\"\n ) |>\n tidyr::nest(my_values = c(my_ints, my_chars))\n\nconstructive::construct(my_nested_tib)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\ntibble::tibble(\n my_values = list(tibble::tibble(my_ints = 1:5, my_chars = c(\"A\", \"B\", \"C\", \"D\", \"E\"))),\n)\n```\n\n\n:::\n:::\n\n\nAs you can see, there is no difference between `my_tib` and `my_nested_tib`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwaldo::compare(my_tib, my_nested_tib)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n✔ No differences\n```\n\n\n:::\n:::\n\n\n# What is the difference between a nested and a packed tibble?\nTo obtain a packed tibble, we should pack the variables `my_ints` and `my_chars` together so that we have a tibble in another tibble instead of a list with an element that is a tibble.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_packed_tib <-\n tibble::tribble(\n ~my_ints, ~my_chars,\n 1L, \"A\",\n 2L, \"B\",\n 3L, \"C\",\n 4L, \"D\",\n 5L, \"E\"\n ) |>\n tidyr::pack(my_values = c(my_ints, my_chars))\nconstructive::construct(my_packed_tib)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\ntibble::tibble(\n my_values = tibble::tibble(my_ints = 1:5, my_chars = c(\"A\", \"B\", \"C\", \"D\", \"E\")),\n)\n```\n\n\n:::\n:::\n\n\nWe can assess the difference between `my_nested_tib` and `my_packed_tib` with `waldo::compare()`. \n\n::: {.cell}\n\n```{.r .cell-code}\nwaldo::compare(my_nested_tib, my_packed_tib)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n`attr(old, 'row.names')`: 1 \n`attr(new, 'row.names')`: 1 2 3 4 5\n\n`old$my_values` is a list\n`new$my_values` is an S3 object of class <tbl_df/tbl/data.frame>, a list\n```\n\n\n:::\n:::\n\n\nThis tells us that `my_nested_tib` has only one row and contains the variable `my_values` that is a list, whereas `my_packed_tib` has 5 rows and is constituted by the variable `my_values` that has, in this case, the class `data.frame`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nclass(my_packed_tib$my_values)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"tbl_df\" \"tbl\" \"data.frame\"\n```\n\n\n:::\n:::\n\n\nFor the record, a data frame is a special list where every element has the same length.\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(my_packed_tib$my_values)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"list\"\n```\n\n\n:::\n:::\n\n\n\n# How to unnest or unpack a tibble?\n\nTo get a tibble without any variable that is a list or a tibble, we should unnest and, respectively, unpack our nested/packed tibble.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_unnested_tib <-\n my_nested_tib |>\n tidyr::unnest(my_values)\n\nconstructive::construct(my_unnested_tib)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\ntibble::tibble(my_ints = 1:5, my_chars = c(\"A\", \"B\", \"C\", \"D\", \"E\"))\n```\n\n\n:::\n:::\n\n\nNow, we have a simple tibble with two variables instead of one single variable that is a list.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_unpacked_tib <-\n my_packed_tib |>\n tidyr::unpack(my_values)\n\nconstructive::construct(my_unpacked_tib)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\ntibble::tibble(my_ints = 1:5, my_chars = c(\"A\", \"B\", \"C\", \"D\", \"E\"))\n```\n\n\n:::\n:::\n\n\nHere again, we obtain a simple tibble with two variables instead of one single variable that has the class `data.frame`.\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
Binary file added posts/2024-05-30_pack-nest/image.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
129 changes: 129 additions & 0 deletions posts/2024-05-30_pack-nest/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
---
title: What is the difference between (un)packing and (un)nesting a tibble?
author:
- name:
given: Layal Christine
family: Lettry
orcid: 0009-0008-6396-0523
affiliations:
- id: cynkra
- name: cynkra GmbH
city: Zurich
state: CH
- id: unifr
- name: University of Fribourg, Dept. of Informatics, ASAM Group
city: Fribourg
state: CH
date: 2024-05-30
categories: [nest, unnest, pack, unpack, tidyr, constructive]
image: image.jpg
citation:
url: https://rdiscovery.netlify.app/posts/2024-05-30_pack-nest/
format:
html:
toc: true
toc-depth: 6
toc-title: Contents
toc-location: right
number-sections: false
editor_options:
chunk_output_type: console
---

*Does a nested tibble have the same structure as a packed tibble?*

# Initial object

Let's assume that we have the object `my_tib` which is a nested tibble containing a list, namely `my_values`, with another tibble where the variables are `my_ints` and `my_chars`.

```{r}
my_tib <-
tibble::tibble(
my_values = list(tibble::tibble(
my_ints = 1L:5L,
my_chars = LETTERS[my_ints]
))
)
constructive::construct(my_tib)
```

We could also use `tidyr::nest()` to create `my_tib` (please refer to [this article](https://tidyr.tidyverse.org/articles/nest.html) for more info).

```{r}
my_nested_tib <-
tibble::tribble(
~my_ints, ~my_chars,
1L, "A",
2L, "B",
3L, "C",
4L, "D",
5L, "E"
) |>
tidyr::nest(my_values = c(my_ints, my_chars))
constructive::construct(my_nested_tib)
```

As you can see, there is no difference between `my_tib` and `my_nested_tib`.

```{r}
waldo::compare(my_tib, my_nested_tib)
```

# What is the difference between a nested and a packed tibble?
To obtain a packed tibble, we should pack the variables `my_ints` and `my_chars` together so that we have a tibble in another tibble instead of a list with an element that is a tibble.

```{r}
my_packed_tib <-
tibble::tribble(
~my_ints, ~my_chars,
1L, "A",
2L, "B",
3L, "C",
4L, "D",
5L, "E"
) |>
tidyr::pack(my_values = c(my_ints, my_chars))
constructive::construct(my_packed_tib)
```

We can assess the difference between `my_nested_tib` and `my_packed_tib` with `waldo::compare()`.
```{r}
waldo::compare(my_nested_tib, my_packed_tib)
```

This tells us that `my_nested_tib` has only one row and contains the variable `my_values` that is a list, whereas `my_packed_tib` has 5 rows and is constituted by the variable `my_values` that has, in this case, the class `data.frame`.

```{r}
class(my_packed_tib$my_values)
```

For the record, a data frame is a special list where every element has the same length.
```{r}
typeof(my_packed_tib$my_values)
```


# How to unnest or unpack a tibble?

To get a tibble without any variable that is a list or a tibble, we should unnest and, respectively, unpack our nested/packed tibble.

```{r}
my_unnested_tib <-
my_nested_tib |>
tidyr::unnest(my_values)
constructive::construct(my_unnested_tib)
```

Now, we have a simple tibble with two variables instead of one single variable that is a list.

```{r}
my_unpacked_tib <-
my_packed_tib |>
tidyr::unpack(my_values)
constructive::construct(my_unpacked_tib)
```

Here again, we obtain a simple tibble with two variables instead of one single variable that has the class `data.frame`.

0 comments on commit 8eefc21

Please sign in to comment.