From 18708747843f38f76def665421679b8d8fef3c52 Mon Sep 17 00:00:00 2001 From: "G. Vituri" <56522687+vituri@users.noreply.github.com> Date: Sun, 13 Oct 2024 14:11:24 -0300 Subject: [PATCH] add eval to interpolate in tidier --- Project.toml | 3 + .../execute-results/html.json | 4 +- .../dataframes-rows/execute-results/html.json | 4 +- _freeze/dataframes/execute-results/html.json | 6 +- dataframes-columns.qmd | 8 +- dataframes-rows.qmd | 52 +- dataframes-rows.quarto_ipynb | 772 ++++++ dataframes.qmd | 18 +- docs/dataframes-columns.html | 40 +- docs/dataframes-rows.html | 2378 +++++++++++------ docs/dataframes.html | 64 +- docs/index.html | 14 +- docs/search.json | 60 +- ...p-5341e04cf592f47685c7d2c736cc69d2.min.css | 12 + ...hting-018089954d508eae8a473f0b7f0491f0.css | 205 ++ docs/site_libs/quarto-html/quarto.js | 5 +- index.qmd | 8 +- 17 files changed, 2822 insertions(+), 831 deletions(-) create mode 100644 dataframes-rows.quarto_ipynb create mode 100644 docs/site_libs/bootstrap/bootstrap-5341e04cf592f47685c7d2c736cc69d2.min.css create mode 100644 docs/site_libs/quarto-html/quarto-syntax-highlighting-018089954d508eae8a473f0b7f0491f0.css diff --git a/Project.toml b/Project.toml index e361e97..5b11b9f 100644 --- a/Project.toml +++ b/Project.toml @@ -2,7 +2,10 @@ Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc" DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0" DataFramesMeta = "1313f7d8-7da2-5740-9ea0-a2ca25f37964" +IJulia = "7073ff75-c697-5162-941a-fcdaad2a7d2a" PalmerPenguins = "8b842266-38fa-440a-9b57-31493939ab85" +QuartoNotebookRunner = "4c0109c6-14e9-4c88-93f0-2b974d3468f4" +REPL = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb" Tidier = "f0413319-3358-4bb0-8e7c-0c83523a93bd" TidierData = "fe2206b3-d496-4ee9-a338-6a095c4ece80" TidierFiles = "8ae5e7a9-bdd3-4c93-9cc3-9df4d5d947db" diff --git a/_freeze/dataframes-columns/execute-results/html.json b/_freeze/dataframes-columns/execute-results/html.json index dafe7f9..93e9cb9 100644 --- a/_freeze/dataframes-columns/execute-results/html.json +++ b/_freeze/dataframes-columns/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "ae0b4023461919e968a39a2482c05d35", + "hash": "9c3c8cca26e9275beace976acb20bd20", "result": { "engine": "julia", - "markdown": "---\n# jupyter: julia-1.10\nengine: julia\n---\n\n\n\n\n# Operations on columns\n\n\n\n\n::: {#2 .cell execution_count=1}\n``` {.julia .cell-code}\nusing DataFrames, PalmerPenguins\nusing Tidier\nimport DataFramesMeta as DFM\n\npenguins = PalmerPenguins.load() |> DataFrame;\n@slice_head(penguins, n = 10)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
10×7 DataFrame
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
\n```\n:::\n:::\n\n\n\n\n\n\n## Selecting (or: throwing columns away)\n\n### Selecting `n` columns\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n::: {#4 .cell execution_count=1}\n``` {.julia .cell-code}\n@select penguins species body_mass_g\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×2 DataFrame
319 rows omitted
Rowspeciesbody_mass_g
String15Int64?
1Adelie3750
2Adelie3800
3Adelie3250
4Adeliemissing
5Adelie3450
6Adelie3650
7Adelie3625
8Adelie4675
9Adelie3475
10Adelie4250
11Adelie3300
12Adelie3700
13Adelie3200
333Chinstrap3250
334Chinstrap4050
335Chinstrap3800
336Chinstrap3525
337Chinstrap3950
338Chinstrap3650
339Chinstrap3650
340Chinstrap4000
341Chinstrap3400
342Chinstrap3775
343Chinstrap4100
344Chinstrap3775
\n```\n:::\n:::\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n::: {#6 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@select penguins :species :body_mass_g\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×2 DataFrame
319 rows omitted
Rowspeciesbody_mass_g
String15Int64?
1Adelie3750
2Adelie3800
3Adelie3250
4Adeliemissing
5Adelie3450
6Adelie3650
7Adelie3625
8Adelie4675
9Adelie3475
10Adelie4250
11Adelie3300
12Adelie3700
13Adelie3200
333Chinstrap3250
334Chinstrap4050
335Chinstrap3800
336Chinstrap3525
337Chinstrap3950
338Chinstrap3650
339Chinstrap3650
340Chinstrap4000
341Chinstrap3400
342Chinstrap3775
343Chinstrap4100
344Chinstrap3775
\n```\n:::\n:::\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n::: {#8 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.select(penguins, [:species, :body_mass_g])\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×2 DataFrame
319 rows omitted
Rowspeciesbody_mass_g
String15Int64?
1Adelie3750
2Adelie3800
3Adelie3250
4Adeliemissing
5Adelie3450
6Adelie3650
7Adelie3625
8Adelie4675
9Adelie3475
10Adelie4250
11Adelie3300
12Adelie3700
13Adelie3200
333Chinstrap3250
334Chinstrap4050
335Chinstrap3800
336Chinstrap3525
337Chinstrap3950
338Chinstrap3650
339Chinstrap3650
340Chinstrap4000
341Chinstrap3400
342Chinstrap3775
343Chinstrap4100
344Chinstrap3775
\n```\n:::\n:::\n\n\n\n\n\n\n:::\n\n### Selecting columns from a variable\n\n::: {.panel-tabset}\n\n\n\n\n::: {#10 .cell execution_count=1}\n``` {.julia .cell-code}\nmy_columns = [:species, :body_mass_g];\n```\n:::\n\n\n\n\n\n\n## Tidier\n\n\n\n\n::: {#12 .cell execution_count=1}\n``` {.julia .cell-code}\n@select penguins !!my_columns\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×2 DataFrame
319 rows omitted
Rowspeciesbody_mass_g
String15Int64?
1Adelie3750
2Adelie3800
3Adelie3250
4Adeliemissing
5Adelie3450
6Adelie3650
7Adelie3625
8Adelie4675
9Adelie3475
10Adelie4250
11Adelie3300
12Adelie3700
13Adelie3200
333Chinstrap3250
334Chinstrap4050
335Chinstrap3800
336Chinstrap3525
337Chinstrap3950
338Chinstrap3650
339Chinstrap3650
340Chinstrap4000
341Chinstrap3400
342Chinstrap3775
343Chinstrap4100
344Chinstrap3775
\n```\n:::\n:::\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n::: {#14 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@select penguins $my_columns\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×2 DataFrame
319 rows omitted
Rowspeciesbody_mass_g
String15Int64?
1Adelie3750
2Adelie3800
3Adelie3250
4Adeliemissing
5Adelie3450
6Adelie3650
7Adelie3625
8Adelie4675
9Adelie3475
10Adelie4250
11Adelie3300
12Adelie3700
13Adelie3200
333Chinstrap3250
334Chinstrap4050
335Chinstrap3800
336Chinstrap3525
337Chinstrap3950
338Chinstrap3650
339Chinstrap3650
340Chinstrap4000
341Chinstrap3400
342Chinstrap3775
343Chinstrap4100
344Chinstrap3775
\n```\n:::\n:::\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n::: {#16 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.select(penguins, my_columns)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×2 DataFrame
319 rows omitted
Rowspeciesbody_mass_g
String15Int64?
1Adelie3750
2Adelie3800
3Adelie3250
4Adeliemissing
5Adelie3450
6Adelie3650
7Adelie3625
8Adelie4675
9Adelie3475
10Adelie4250
11Adelie3300
12Adelie3700
13Adelie3200
333Chinstrap3250
334Chinstrap4050
335Chinstrap3800
336Chinstrap3525
337Chinstrap3950
338Chinstrap3650
339Chinstrap3650
340Chinstrap4000
341Chinstrap3400
342Chinstrap3775
343Chinstrap4100
344Chinstrap3775
\n```\n:::\n:::\n\n\n\n\n\n\n:::\n\n## Mutating (or: creating columns)\n\n### Creating one column based on another one\n\nCreate the column `body_mass_kg` by dividing `body_mass_g` by 1000.\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n::: {#18 .cell execution_count=1}\n``` {.julia .cell-code}\n@mutate penguins body_mass_kg = body_mass_g / 1000\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×8 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsexbody_mass_kg
String15String15Float64?Float64?Int64?Int64?String7Float64?
1AdelieTorgersen39.118.71813750male3.75
2AdelieTorgersen39.517.41863800female3.8
3AdelieTorgersen40.318.01953250female3.25
4AdelieTorgersenmissingmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female3.45
6AdelieTorgersen39.320.61903650male3.65
7AdelieTorgersen38.917.81813625female3.625
8AdelieTorgersen39.219.61954675male4.675
9AdelieTorgersen34.118.11933475missing3.475
10AdelieTorgersen42.020.21904250missing4.25
11AdelieTorgersen37.817.11863300missing3.3
12AdelieTorgersen37.817.31803700missing3.7
13AdelieTorgersen41.117.61823200female3.2
333ChinstrapDream45.216.61913250female3.25
334ChinstrapDream49.319.92034050male4.05
335ChinstrapDream50.218.82023800male3.8
336ChinstrapDream45.619.41943525female3.525
337ChinstrapDream51.919.52063950male3.95
338ChinstrapDream46.816.51893650female3.65
339ChinstrapDream45.717.01953650female3.65
340ChinstrapDream55.819.82074000male4.0
341ChinstrapDream43.518.12023400female3.4
342ChinstrapDream49.618.21933775male3.775
343ChinstrapDream50.819.02104100male4.1
344ChinstrapDream50.218.71983775female3.775
\n```\n:::\n:::\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n::: {#20 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rtransform penguins :body_mass_kg = :body_mass_g / 1000\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×8 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsexbody_mass_kg
String15String15Float64?Float64?Int64?Int64?String7Float64?
1AdelieTorgersen39.118.71813750male3.75
2AdelieTorgersen39.517.41863800female3.8
3AdelieTorgersen40.318.01953250female3.25
4AdelieTorgersenmissingmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female3.45
6AdelieTorgersen39.320.61903650male3.65
7AdelieTorgersen38.917.81813625female3.625
8AdelieTorgersen39.219.61954675male4.675
9AdelieTorgersen34.118.11933475missing3.475
10AdelieTorgersen42.020.21904250missing4.25
11AdelieTorgersen37.817.11863300missing3.3
12AdelieTorgersen37.817.31803700missing3.7
13AdelieTorgersen41.117.61823200female3.2
333ChinstrapDream45.216.61913250female3.25
334ChinstrapDream49.319.92034050male4.05
335ChinstrapDream50.218.82023800male3.8
336ChinstrapDream45.619.41943525female3.525
337ChinstrapDream51.919.52063950male3.95
338ChinstrapDream46.816.51893650female3.65
339ChinstrapDream45.717.01953650female3.65
340ChinstrapDream55.819.82074000male4.0
341ChinstrapDream43.518.12023400female3.4
342ChinstrapDream49.618.21933775male3.775
343ChinstrapDream50.819.02104100male4.1
344ChinstrapDream50.218.71983775female3.775
\n```\n:::\n:::\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n::: {#22 .cell execution_count=1}\n``` {.julia .cell-code}\npenguins2 = copy(penguins);\npenguins.body_mass_kg = penguins.body_mass_g ./ 1000;\npenguins2\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
333ChinstrapDream45.216.61913250female
334ChinstrapDream49.319.92034050male
335ChinstrapDream50.218.82023800male
336ChinstrapDream45.619.41943525female
337ChinstrapDream51.919.52063950male
338ChinstrapDream46.816.51893650female
339ChinstrapDream45.717.01953650female
340ChinstrapDream55.819.82074000male
341ChinstrapDream43.518.12023400female
342ChinstrapDream49.618.21933775male
343ChinstrapDream50.819.02104100male
344ChinstrapDream50.218.71983775female
\n```\n:::\n:::\n\n\n\n\n\n\n:::\n\n## Conditionally mutating columns\n\n", + "markdown": "---\n# jupyter: julia-1.10\nengine: julia\n---\n\n\n\n\n\n# Operations on columns\n\n\n\n\n\n::: {#2 .cell execution_count=1}\n``` {.julia .cell-code}\nusing DataFrames, PalmerPenguins\nusing Tidier\nimport DataFramesMeta as DFM\n\npenguins = PalmerPenguins.load() |> DataFrame;\n@slice_head(penguins, n = 10)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
10×7 DataFrame
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
\n```\n:::\n:::\n\n\n\n\n\n\n\n## Selecting (or: throwing columns away)\n\n### Selecting `n` columns\n\n**Problem:** Select only some columns.\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#4 .cell execution_count=1}\n``` {.julia .cell-code}\n@select penguins species body_mass_g\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×2 DataFrame
319 rows omitted
Rowspeciesbody_mass_g
String15Int64?
1Adelie3750
2Adelie3800
3Adelie3250
4Adeliemissing
5Adelie3450
6Adelie3650
7Adelie3625
8Adelie4675
9Adelie3475
10Adelie4250
11Adelie3300
12Adelie3700
13Adelie3200
333Chinstrap3250
334Chinstrap4050
335Chinstrap3800
336Chinstrap3525
337Chinstrap3950
338Chinstrap3650
339Chinstrap3650
340Chinstrap4000
341Chinstrap3400
342Chinstrap3775
343Chinstrap4100
344Chinstrap3775
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#6 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@select penguins :species :body_mass_g\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×2 DataFrame
319 rows omitted
Rowspeciesbody_mass_g
String15Int64?
1Adelie3750
2Adelie3800
3Adelie3250
4Adeliemissing
5Adelie3450
6Adelie3650
7Adelie3625
8Adelie4675
9Adelie3475
10Adelie4250
11Adelie3300
12Adelie3700
13Adelie3200
333Chinstrap3250
334Chinstrap4050
335Chinstrap3800
336Chinstrap3525
337Chinstrap3950
338Chinstrap3650
339Chinstrap3650
340Chinstrap4000
341Chinstrap3400
342Chinstrap3775
343Chinstrap4100
344Chinstrap3775
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#8 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.select(penguins, [:species, :body_mass_g])\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×2 DataFrame
319 rows omitted
Rowspeciesbody_mass_g
String15Int64?
1Adelie3750
2Adelie3800
3Adelie3250
4Adeliemissing
5Adelie3450
6Adelie3650
7Adelie3625
8Adelie4675
9Adelie3475
10Adelie4250
11Adelie3300
12Adelie3700
13Adelie3200
333Chinstrap3250
334Chinstrap4050
335Chinstrap3800
336Chinstrap3525
337Chinstrap3950
338Chinstrap3650
339Chinstrap3650
340Chinstrap4000
341Chinstrap3400
342Chinstrap3775
343Chinstrap4100
344Chinstrap3775
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n### Selecting columns from a variable\n\n**Problem:** Select only some columns whose names are stored in a variable.\n\n::: {.panel-tabset}\n\n\n\n\n\n::: {#10 .cell execution_count=1}\n``` {.julia .cell-code}\nmy_columns = [:species, :body_mass_g];\n```\n:::\n\n\n\n\n\n\n\n## Tidier\n\n\n\n\n\n::: {#12 .cell execution_count=1}\n``` {.julia .cell-code}\n@eval @select penguins $my_columns...\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×2 DataFrame
319 rows omitted
Rowspeciesbody_mass_g
String15Int64?
1Adelie3750
2Adelie3800
3Adelie3250
4Adeliemissing
5Adelie3450
6Adelie3650
7Adelie3625
8Adelie4675
9Adelie3475
10Adelie4250
11Adelie3300
12Adelie3700
13Adelie3200
333Chinstrap3250
334Chinstrap4050
335Chinstrap3800
336Chinstrap3525
337Chinstrap3950
338Chinstrap3650
339Chinstrap3650
340Chinstrap4000
341Chinstrap3400
342Chinstrap3775
343Chinstrap4100
344Chinstrap3775
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#14 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@select penguins $my_columns\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×2 DataFrame
319 rows omitted
Rowspeciesbody_mass_g
String15Int64?
1Adelie3750
2Adelie3800
3Adelie3250
4Adeliemissing
5Adelie3450
6Adelie3650
7Adelie3625
8Adelie4675
9Adelie3475
10Adelie4250
11Adelie3300
12Adelie3700
13Adelie3200
333Chinstrap3250
334Chinstrap4050
335Chinstrap3800
336Chinstrap3525
337Chinstrap3950
338Chinstrap3650
339Chinstrap3650
340Chinstrap4000
341Chinstrap3400
342Chinstrap3775
343Chinstrap4100
344Chinstrap3775
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#16 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.select(penguins, my_columns)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×2 DataFrame
319 rows omitted
Rowspeciesbody_mass_g
String15Int64?
1Adelie3750
2Adelie3800
3Adelie3250
4Adeliemissing
5Adelie3450
6Adelie3650
7Adelie3625
8Adelie4675
9Adelie3475
10Adelie4250
11Adelie3300
12Adelie3700
13Adelie3200
333Chinstrap3250
334Chinstrap4050
335Chinstrap3800
336Chinstrap3525
337Chinstrap3950
338Chinstrap3650
339Chinstrap3650
340Chinstrap4000
341Chinstrap3400
342Chinstrap3775
343Chinstrap4100
344Chinstrap3775
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n## Mutating (or: creating columns)\n\n### Creating one column based on another one\n\n**Problem:** Create the column `body_mass_kg` by dividing `body_mass_g` by 1000.\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#18 .cell execution_count=1}\n``` {.julia .cell-code}\n@mutate penguins body_mass_kg = body_mass_g / 1000\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×8 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsexbody_mass_kg
String15String15Float64?Float64?Int64?Int64?String7?Float64?
1AdelieTorgersen39.118.71813750male3.75
2AdelieTorgersen39.517.41863800female3.8
3AdelieTorgersen40.318.01953250female3.25
4AdelieTorgersenmissingmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female3.45
6AdelieTorgersen39.320.61903650male3.65
7AdelieTorgersen38.917.81813625female3.625
8AdelieTorgersen39.219.61954675male4.675
9AdelieTorgersen34.118.11933475missing3.475
10AdelieTorgersen42.020.21904250missing4.25
11AdelieTorgersen37.817.11863300missing3.3
12AdelieTorgersen37.817.31803700missing3.7
13AdelieTorgersen41.117.61823200female3.2
333ChinstrapDream45.216.61913250female3.25
334ChinstrapDream49.319.92034050male4.05
335ChinstrapDream50.218.82023800male3.8
336ChinstrapDream45.619.41943525female3.525
337ChinstrapDream51.919.52063950male3.95
338ChinstrapDream46.816.51893650female3.65
339ChinstrapDream45.717.01953650female3.65
340ChinstrapDream55.819.82074000male4.0
341ChinstrapDream43.518.12023400female3.4
342ChinstrapDream49.618.21933775male3.775
343ChinstrapDream50.819.02104100male4.1
344ChinstrapDream50.218.71983775female3.775
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#20 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rtransform penguins :body_mass_kg = :body_mass_g / 1000\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×8 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsexbody_mass_kg
String15String15Float64?Float64?Int64?Int64?String7?Float64?
1AdelieTorgersen39.118.71813750male3.75
2AdelieTorgersen39.517.41863800female3.8
3AdelieTorgersen40.318.01953250female3.25
4AdelieTorgersenmissingmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female3.45
6AdelieTorgersen39.320.61903650male3.65
7AdelieTorgersen38.917.81813625female3.625
8AdelieTorgersen39.219.61954675male4.675
9AdelieTorgersen34.118.11933475missing3.475
10AdelieTorgersen42.020.21904250missing4.25
11AdelieTorgersen37.817.11863300missing3.3
12AdelieTorgersen37.817.31803700missing3.7
13AdelieTorgersen41.117.61823200female3.2
333ChinstrapDream45.216.61913250female3.25
334ChinstrapDream49.319.92034050male4.05
335ChinstrapDream50.218.82023800male3.8
336ChinstrapDream45.619.41943525female3.525
337ChinstrapDream51.919.52063950male3.95
338ChinstrapDream46.816.51893650female3.65
339ChinstrapDream45.717.01953650female3.65
340ChinstrapDream55.819.82074000male4.0
341ChinstrapDream43.518.12023400female3.4
342ChinstrapDream49.618.21933775male3.775
343ChinstrapDream50.819.02104100male4.1
344ChinstrapDream50.218.71983775female3.775
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#22 .cell execution_count=1}\n``` {.julia .cell-code}\npenguins2 = copy(penguins);\npenguins.body_mass_kg = penguins.body_mass_g ./ 1000;\npenguins2\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
333ChinstrapDream45.216.61913250female
334ChinstrapDream49.319.92034050male
335ChinstrapDream50.218.82023800male
336ChinstrapDream45.619.41943525female
337ChinstrapDream51.919.52063950male
338ChinstrapDream46.816.51893650female
339ChinstrapDream45.717.01953650female
340ChinstrapDream55.819.82074000male
341ChinstrapDream43.518.12023400female
342ChinstrapDream49.618.21933775male
343ChinstrapDream50.819.02104100male
344ChinstrapDream50.218.71983775female
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n## Conditionally mutating columns\n\n", "supporting": [ "dataframes-columns_files" ], diff --git a/_freeze/dataframes-rows/execute-results/html.json b/_freeze/dataframes-rows/execute-results/html.json index 4df163b..008b144 100644 --- a/_freeze/dataframes-rows/execute-results/html.json +++ b/_freeze/dataframes-rows/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "85982cb4c438a90c8fed3228cd22b0c3", + "hash": "179b20a288d6219cf6c2118f9422b20d", "result": { "engine": "julia", - "markdown": "---\n# jupyter: julia-1.10\nengine: julia\n---\n\n\n\n\n\n# Operations on rows\n\n\n\n\n\n::: {#2 .cell execution_count=1}\n``` {.julia .cell-code}\nusing DataFrames, PalmerPenguins\nusing Tidier\nimport DataFramesMeta as DFM\n\npenguins = PalmerPenguins.load() |> DataFrame;\n@slice_head(penguins, n = 10)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
10×7 DataFrame
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
\n```\n:::\n:::\n\n\n\n\n\n\n\n## Filtering (or: throwing lines away)\n\nTo filter a dataframe means keeping only the rows that satisfy a certain criteria (ie. a boolean condition).\n\nTo filter a dataframe in Tidier, we use the macro `@filter`. You can use it in the form\n\n\n\n\n\n::: {#4 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter(penguins, species == \"Adelie\")\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\nor without parentesis as in \n\n\n\n\n\n::: {#6 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter penguins species == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\nNotice that the columns are typed as if they were variables on the Julia environment. This is inspired by the `tidyverse` behaviour of data-masking: inside a tidyverse verb, the columns are taken as \"statistical variables\" that exist inside the dataframe as columns.\n\nIn DataFramesMeta, we have two macros for filtering: `@subset` and `@rsubset`. Use the first when you have some criteria that uses a whole column, for example:\n\n\n\n\n\n::: {#8 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@subset penguins :body_mass_g .>= mean(skipmissing(:body_mass_g))\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
149×7 DataFrame
124 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.219.61954675male
2AdelieTorgersen42.020.21904250missing
3AdelieTorgersen34.621.11984400male
4AdelieTorgersen42.520.71974500male
5AdelieDream39.819.11844650male
6AdelieDream44.119.71964400male
7AdelieDream39.618.81904600male
8AdelieBiscoe40.118.91884300male
9AdelieBiscoe41.321.11954400male
10AdelieTorgersen41.819.41984450male
11AdelieTorgersen42.818.51954250male
12AdelieTorgersen42.917.61964700male
13AdelieDream41.118.12054300male
138GentooBiscoe47.213.72144925female
139GentooBiscoe46.814.32154850female
140GentooBiscoe50.415.72225750male
141GentooBiscoe45.214.82125200female
142GentooBiscoe49.916.12135400male
143ChinstrapDream49.218.21954400male
144ChinstrapDream52.820.02054550male
145ChinstrapDream54.220.82014300male
146ChinstrapDream52.020.72104800male
147ChinstrapDream53.519.92054500male
148ChinstrapDream50.818.52014450male
149ChinstrapDream49.019.62124300male
\n```\n:::\n:::\n\n\n\n\n\n\n\nNotice the broadcast on >=. We need it because *each variable is interpreted as a vector (the whole column)*. Also, notice that we refer to columns as _symbols_ (i.e. we append `:` to it).\n\nIn the above example, we needed the whole column `body_mass_g` to take the mean and then filter the rows based on that. If, however, your filtering criteria only uses information about each row (without needing to see it in context of the whole column), then `@rsubset` (row subset) is easier to use: it interprets each columns as a value (not an array), so no broadcasting is needed:\n\n\n\n\n\n::: {#10 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rsubset penguins :species == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\nIn both Tidier and DataFramesMeta, only the rows to which the criteria is `true` are returned. This means that `false` and `missing` are thrown away.\n\nIn pure DataFrames, we use the `subset` function, and the criteria is passed with the notation\n\n\n\n\n\n::: {#12 .cell execution_count=0}\n``` {.julia .cell-code}\nsubset(penguins, :column => boolean_function)\n\n```\n:::\n\n\n\n\n\n\n\nwhere `boolean_function` is a boolean (with possibly `missing` values) function on 1 variable. Add the kwarg `skipmissing=true` if you want to get rid of missing values.\n\n### Filtering with one criteria\n\nFiltering all the rows with `species` == \"Adelie\".\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#14 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter penguins species == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#16 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rsubset penguins :species == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#18 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(penguins, :species => x -> x .== \"Adelie\", skipmissing=true)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n### Filtering with several criteria\n\nFiltering all the rows with `species` == \"Adelie\", `sex` == \"male\" and `body_mass_g` > 4000.\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#20 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter penguins species == \"Adelie\" sex == \"male\" body_mass_g > 4000\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
34×7 DataFrame
9 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.219.61954675male
2AdelieTorgersen34.621.11984400male
3AdelieTorgersen42.520.71974500male
4AdelieTorgersen46.021.51944200male
5AdelieDream39.221.11964150male
6AdelieDream39.819.11844650male
7AdelieDream44.119.71964400male
8AdelieDream39.618.81904600male
9AdelieDream42.321.21914150male
10AdelieBiscoe40.118.91884300male
11AdelieBiscoe42.019.52004050male
12AdelieBiscoe41.321.11954400male
13AdelieBiscoe41.118.21924050male
23AdelieDream40.318.51964350male
24AdelieDream43.218.51924100male
25AdelieBiscoe41.020.02034725male
26AdelieBiscoe37.820.01904250male
27AdelieBiscoe43.219.01974775male
28AdelieBiscoe45.620.31914600male
29AdelieBiscoe42.219.51974275male
30AdelieBiscoe42.718.31964075male
31AdelieTorgersen41.518.31954300male
32AdelieDream37.518.51994475male
33AdelieDream39.717.91934250male
34AdelieDream39.218.61904250male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#22 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rsubset penguins :species == \"Adelie\" :sex == \"male\" :body_mass_g > 4000\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
34×7 DataFrame
9 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.219.61954675male
2AdelieTorgersen34.621.11984400male
3AdelieTorgersen42.520.71974500male
4AdelieTorgersen46.021.51944200male
5AdelieDream39.221.11964150male
6AdelieDream39.819.11844650male
7AdelieDream44.119.71964400male
8AdelieDream39.618.81904600male
9AdelieDream42.321.21914150male
10AdelieBiscoe40.118.91884300male
11AdelieBiscoe42.019.52004050male
12AdelieBiscoe41.321.11954400male
13AdelieBiscoe41.118.21924050male
23AdelieDream40.318.51964350male
24AdelieDream43.218.51924100male
25AdelieBiscoe41.020.02034725male
26AdelieBiscoe37.820.01904250male
27AdelieBiscoe43.219.01974775male
28AdelieBiscoe45.620.31914600male
29AdelieBiscoe42.219.51974275male
30AdelieBiscoe42.718.31964075male
31AdelieTorgersen41.518.31954300male
32AdelieDream37.518.51994475male
33AdelieDream39.717.91934250male
34AdelieDream39.218.61904250male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#24 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(\n penguins\n , [:species, :sex, :body_mass_g] => \n (x, y, z) -> (x .== \"Adelie\") .& (y .== \"male\") .& (z .> 4000)\n ,skipmissing=true\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
34×7 DataFrame
9 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.219.61954675male
2AdelieTorgersen34.621.11984400male
3AdelieTorgersen42.520.71974500male
4AdelieTorgersen46.021.51944200male
5AdelieDream39.221.11964150male
6AdelieDream39.819.11844650male
7AdelieDream44.119.71964400male
8AdelieDream39.618.81904600male
9AdelieDream42.321.21914150male
10AdelieBiscoe40.118.91884300male
11AdelieBiscoe42.019.52004050male
12AdelieBiscoe41.321.11954400male
13AdelieBiscoe41.118.21924050male
23AdelieDream40.318.51964350male
24AdelieDream43.218.51924100male
25AdelieBiscoe41.020.02034725male
26AdelieBiscoe37.820.01904250male
27AdelieBiscoe43.219.01974775male
28AdelieBiscoe45.620.31914600male
29AdelieBiscoe42.219.51974275male
30AdelieBiscoe42.718.31964075male
31AdelieTorgersen41.518.31954300male
32AdelieDream37.518.51994475male
33AdelieDream39.717.91934250male
34AdelieDream39.218.61904250male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n\nFiltering all the rows with `species` == \"Adelie\" OR `sex` == \"male\".\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#26 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter penguins (species == \"Adelie\") | (sex == \"male\")\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
247×7 DataFrame
222 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
236ChinstrapDream50.818.52014450male
237ChinstrapDream49.019.62124300male
238ChinstrapDream51.518.71873250male
239ChinstrapDream51.419.02013950male
240ChinstrapDream50.719.72034050male
241ChinstrapDream52.218.81973450male
242ChinstrapDream49.319.92034050male
243ChinstrapDream50.218.82023800male
244ChinstrapDream51.919.52063950male
245ChinstrapDream55.819.82074000male
246ChinstrapDream49.618.21933775male
247ChinstrapDream50.819.02104100male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#28 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rsubset penguins (:species == \"Adelie\") | (:sex == \"male\")\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
247×7 DataFrame
222 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
236ChinstrapDream50.818.52014450male
237ChinstrapDream49.019.62124300male
238ChinstrapDream51.518.71873250male
239ChinstrapDream51.419.02013950male
240ChinstrapDream50.719.72034050male
241ChinstrapDream52.218.81973450male
242ChinstrapDream49.319.92034050male
243ChinstrapDream50.218.82023800male
244ChinstrapDream51.919.52063950male
245ChinstrapDream55.819.82074000male
246ChinstrapDream49.618.21933775male
247ChinstrapDream50.819.02104100male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#30 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(penguins, [:species, :sex] => (x, y) -> (x .== \"Adelie\") .| (y .== \"male\"), skipmissing=true)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
247×7 DataFrame
222 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
236ChinstrapDream50.818.52014450male
237ChinstrapDream49.019.62124300male
238ChinstrapDream51.518.71873250male
239ChinstrapDream51.419.02013950male
240ChinstrapDream50.719.72034050male
241ChinstrapDream52.218.81973450male
242ChinstrapDream49.319.92034050male
243ChinstrapDream50.218.82023800male
244ChinstrapDream51.919.52063950male
245ChinstrapDream55.819.82074000male
246ChinstrapDream49.618.21933775male
247ChinstrapDream50.819.02104100male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n\nFiltering all the rows where the `flipper_length_mm` is greater than the mean.\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#32 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter penguins flipper_length_mm > mean(skipmissing(flipper_length_mm))\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
148×7 DataFrame
123 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieDream35.718.02023550female
2AdelieDream41.118.12054300male
3AdelieDream40.818.92084300male
4AdelieBiscoe41.020.02034725male
5AdelieTorgersen41.418.52023875male
6AdelieTorgersen44.118.02104000male
7AdelieDream41.518.52014000male
8GentooBiscoe46.113.22114500female
9GentooBiscoe50.016.32305700male
10GentooBiscoe48.714.12104450female
11GentooBiscoe50.015.22185700male
12GentooBiscoe47.614.52155400male
13GentooBiscoe46.513.52104550female
137ChinstrapDream53.519.92054500male
138ChinstrapDream49.019.52103950male
139ChinstrapDream50.818.52014450male
140ChinstrapDream49.019.62124300male
141ChinstrapDream51.419.02013950male
142ChinstrapDream50.719.72034050male
143ChinstrapDream49.319.92034050male
144ChinstrapDream50.218.82023800male
145ChinstrapDream51.919.52063950male
146ChinstrapDream55.819.82074000male
147ChinstrapDream43.518.12023400female
148ChinstrapDream50.819.02104100male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#34 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@subset penguins :flipper_length_mm .>= mean(skipmissing(:flipper_length_mm))\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
148×7 DataFrame
123 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieDream35.718.02023550female
2AdelieDream41.118.12054300male
3AdelieDream40.818.92084300male
4AdelieBiscoe41.020.02034725male
5AdelieTorgersen41.418.52023875male
6AdelieTorgersen44.118.02104000male
7AdelieDream41.518.52014000male
8GentooBiscoe46.113.22114500female
9GentooBiscoe50.016.32305700male
10GentooBiscoe48.714.12104450female
11GentooBiscoe50.015.22185700male
12GentooBiscoe47.614.52155400male
13GentooBiscoe46.513.52104550female
137ChinstrapDream53.519.92054500male
138ChinstrapDream49.019.52103950male
139ChinstrapDream50.818.52014450male
140ChinstrapDream49.019.62124300male
141ChinstrapDream51.419.02013950male
142ChinstrapDream50.719.72034050male
143ChinstrapDream49.319.92034050male
144ChinstrapDream50.218.82023800male
145ChinstrapDream51.919.52063950male
146ChinstrapDream55.819.82074000male
147ChinstrapDream43.518.12023400female
148ChinstrapDream50.819.02104100male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#36 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(penguins, :flipper_length_mm => x -> x .> mean(skipmissing(x)), skipmissing=true)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
148×7 DataFrame
123 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieDream35.718.02023550female
2AdelieDream41.118.12054300male
3AdelieDream40.818.92084300male
4AdelieBiscoe41.020.02034725male
5AdelieTorgersen41.418.52023875male
6AdelieTorgersen44.118.02104000male
7AdelieDream41.518.52014000male
8GentooBiscoe46.113.22114500female
9GentooBiscoe50.016.32305700male
10GentooBiscoe48.714.12104450female
11GentooBiscoe50.015.22185700male
12GentooBiscoe47.614.52155400male
13GentooBiscoe46.513.52104550female
137ChinstrapDream53.519.92054500male
138ChinstrapDream49.019.52103950male
139ChinstrapDream50.818.52014450male
140ChinstrapDream49.019.62124300male
141ChinstrapDream51.419.02013950male
142ChinstrapDream50.719.72034050male
143ChinstrapDream49.319.92034050male
144ChinstrapDream50.218.82023800male
145ChinstrapDream51.919.52063950male
146ChinstrapDream55.819.82074000male
147ChinstrapDream43.518.12023400female
148ChinstrapDream50.819.02104100male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n### Filtering with a variable column name\n\nSuppose the column you want to filter is a variable, let's say\n\n\n\n\n\n::: {#38 .cell execution_count=1}\n``` {.julia .cell-code}\nmy_column = :species;\n```\n:::\n\n\n\n\n\n\n\n::: {.panel-tabset}\n\n## DataFramesMeta\n\n\n\n\n\n::: {#40 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rsubset penguins $my_column == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#42 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(penguins, my_column => x -> x .== \"Adelie\")\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\nIn case the column is a string\n\n\n\n\n\n::: {#44 .cell execution_count=1}\n``` {.julia .cell-code}\nmy_column_string = \"species\";\n```\n:::\n\n\n\n\n\n\n\ninstead of a symbol, we can write in the same way\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#46 .cell execution_count=1}\n``` {.julia .cell-code}\n# @filter(penguins, !!my_column == \"Adelie\")\n```\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#48 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rsubset penguins $(my_column_string) == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#50 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(penguins, my_column_string => x -> x .== \"Adelie\")\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n## Arranging\n\nArranging is when we reorder the rows of a dataframe according to some columns. The rows are first arranged by the first column, then by the second (if any), and so on. In Tidier, when we want to invert the ordering, just put the column name inside a `desc()` call.\n\n### Arranging by one column\n\nArrange by `body_mass_g`.\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#52 .cell execution_count=1}\n``` {.julia .cell-code}\n@arrange penguins body_mass_g\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1ChinstrapDream46.916.61922700female
2AdelieBiscoe36.516.61812850female
3AdelieBiscoe36.417.11842850female
4AdelieBiscoe34.518.11872900female
5AdelieDream33.116.11782900female
6AdelieTorgersen38.617.01882900female
7ChinstrapDream43.216.61872900female
8AdelieBiscoe37.918.61932925female
9AdelieDream37.518.91792975missing
10AdelieDream37.016.91853000female
11AdelieDream37.316.81923000female
12AdelieTorgersen35.916.61903050female
13AdelieTorgersen35.215.91863050female
333GentooBiscoe48.616.02305800male
334GentooBiscoe48.414.62135850male
335GentooBiscoe49.315.72175850male
336GentooBiscoe55.116.02305850male
337GentooBiscoe45.216.42235950male
338GentooBiscoe49.815.92295950male
339GentooBiscoe51.116.32206000male
340GentooBiscoe48.816.22226000male
341GentooBiscoe59.617.02306050male
342GentooBiscoe49.215.22216300male
343AdelieTorgersenmissingmissingmissingmissingmissing
344GentooBiscoemissingmissingmissingmissingmissing
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#54 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@orderby penguins :body_mass_g\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1ChinstrapDream46.916.61922700female
2AdelieBiscoe36.516.61812850female
3AdelieBiscoe36.417.11842850female
4AdelieBiscoe34.518.11872900female
5AdelieDream33.116.11782900female
6AdelieTorgersen38.617.01882900female
7ChinstrapDream43.216.61872900female
8AdelieBiscoe37.918.61932925female
9AdelieDream37.518.91792975missing
10AdelieDream37.016.91853000female
11AdelieDream37.316.81923000female
12AdelieTorgersen35.916.61903050female
13AdelieTorgersen35.215.91863050female
333GentooBiscoe48.616.02305800male
334GentooBiscoe48.414.62135850male
335GentooBiscoe49.315.72175850male
336GentooBiscoe55.116.02305850male
337GentooBiscoe45.216.42235950male
338GentooBiscoe49.815.92295950male
339GentooBiscoe51.116.32206000male
340GentooBiscoe48.816.22226000male
341GentooBiscoe59.617.02306050male
342GentooBiscoe49.215.22216300male
343AdelieTorgersenmissingmissingmissingmissingmissing
344GentooBiscoemissingmissingmissingmissingmissing
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#56 .cell execution_count=1}\n``` {.julia .cell-code}\nsort(penguins, :body_mass_g)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1ChinstrapDream46.916.61922700female
2AdelieBiscoe36.516.61812850female
3AdelieBiscoe36.417.11842850female
4AdelieBiscoe34.518.11872900female
5AdelieDream33.116.11782900female
6AdelieTorgersen38.617.01882900female
7ChinstrapDream43.216.61872900female
8AdelieBiscoe37.918.61932925female
9AdelieDream37.518.91792975missing
10AdelieDream37.016.91853000female
11AdelieDream37.316.81923000female
12AdelieTorgersen35.916.61903050female
13AdelieTorgersen35.215.91863050female
333GentooBiscoe48.616.02305800male
334GentooBiscoe48.414.62135850male
335GentooBiscoe49.315.72175850male
336GentooBiscoe55.116.02305850male
337GentooBiscoe45.216.42235950male
338GentooBiscoe49.815.92295950male
339GentooBiscoe51.116.32206000male
340GentooBiscoe48.816.22226000male
341GentooBiscoe59.617.02306050male
342GentooBiscoe49.215.22216300male
343AdelieTorgersenmissingmissingmissingmissingmissing
344GentooBiscoemissingmissingmissingmissingmissing
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n### Arranging by two columns, with one reversed\n\nFirst arrange by `island`, then by reversed `body_mass_g`.\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#58 .cell execution_count=1}\n``` {.julia .cell-code}\n@arrange penguins island desc(body_mass_g)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1GentooBiscoemissingmissingmissingmissingmissing
2GentooBiscoe49.215.22216300male
3GentooBiscoe59.617.02306050male
4GentooBiscoe51.116.32206000male
5GentooBiscoe48.816.22226000male
6GentooBiscoe45.216.42235950male
7GentooBiscoe49.815.92295950male
8GentooBiscoe48.414.62135850male
9GentooBiscoe49.315.72175850male
10GentooBiscoe55.116.02305850male
11GentooBiscoe49.516.22295800male
12GentooBiscoe48.616.02305800male
13GentooBiscoe50.415.72225750male
333AdelieTorgersen41.118.61893325male
334AdelieTorgersen38.517.91903325female
335AdelieTorgersen37.817.11863300missing
336AdelieTorgersen38.817.61913275female
337AdelieTorgersen40.318.01953250female
338AdelieTorgersen41.117.61823200female
339AdelieTorgersen34.617.21893200female
340AdelieTorgersen36.217.21873150female
341AdelieTorgersen35.916.61903050female
342AdelieTorgersen35.215.91863050female
343AdelieTorgersen39.017.11913050female
344AdelieTorgersen38.617.01882900female
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#60 .cell execution_count=1}\n``` {.julia .cell-code}\n# works only when the reversed column is numeric?\n\nDFM.@orderby penguins :island :body_mass_g .* -1\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1GentooBiscoe49.215.22216300male
2GentooBiscoe59.617.02306050male
3GentooBiscoe51.116.32206000male
4GentooBiscoe48.816.22226000male
5GentooBiscoe45.216.42235950male
6GentooBiscoe49.815.92295950male
7GentooBiscoe48.414.62135850male
8GentooBiscoe49.315.72175850male
9GentooBiscoe55.116.02305850male
10GentooBiscoe49.516.22295800male
11GentooBiscoe48.616.02305800male
12GentooBiscoe50.415.72225750male
13GentooBiscoe50.016.32305700male
333AdelieTorgersen38.517.91903325female
334AdelieTorgersen37.817.11863300missing
335AdelieTorgersen38.817.61913275female
336AdelieTorgersen40.318.01953250female
337AdelieTorgersen41.117.61823200female
338AdelieTorgersen34.617.21893200female
339AdelieTorgersen36.217.21873150female
340AdelieTorgersen35.916.61903050female
341AdelieTorgersen35.215.91863050female
342AdelieTorgersen39.017.11913050female
343AdelieTorgersen38.617.01882900female
344AdelieTorgersenmissingmissingmissingmissingmissing
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#62 .cell execution_count=1}\n``` {.julia .cell-code}\nsort(penguins, [order(:island), order(:body_mass_g, rev=true)])\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1GentooBiscoemissingmissingmissingmissingmissing
2GentooBiscoe49.215.22216300male
3GentooBiscoe59.617.02306050male
4GentooBiscoe51.116.32206000male
5GentooBiscoe48.816.22226000male
6GentooBiscoe45.216.42235950male
7GentooBiscoe49.815.92295950male
8GentooBiscoe48.414.62135850male
9GentooBiscoe49.315.72175850male
10GentooBiscoe55.116.02305850male
11GentooBiscoe49.516.22295800male
12GentooBiscoe48.616.02305800male
13GentooBiscoe50.415.72225750male
333AdelieTorgersen41.118.61893325male
334AdelieTorgersen38.517.91903325female
335AdelieTorgersen37.817.11863300missing
336AdelieTorgersen38.817.61913275female
337AdelieTorgersen40.318.01953250female
338AdelieTorgersen41.117.61823200female
339AdelieTorgersen34.617.21893200female
340AdelieTorgersen36.217.21873150female
341AdelieTorgersen35.916.61903050female
342AdelieTorgersen35.215.91863050female
343AdelieTorgersen39.017.11913050female
344AdelieTorgersen38.617.01882900female
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n### Arranging by one variable column\n\nLet's arrange the data by the following column:\n\n\n\n\n\n::: {#64 .cell execution_count=1}\n``` {.julia .cell-code}\nmy_arrange_column = :body_mass_g;\n```\n:::\n\n\n\n\n\n\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#66 .cell execution_count=1}\n``` {.julia .cell-code}\n#?? how to do it?\n# @arrange penguins !!my_arrange_column\n```\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#68 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@orderby penguins $my_arrange_column\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1ChinstrapDream46.916.61922700female
2AdelieBiscoe36.516.61812850female
3AdelieBiscoe36.417.11842850female
4AdelieBiscoe34.518.11872900female
5AdelieDream33.116.11782900female
6AdelieTorgersen38.617.01882900female
7ChinstrapDream43.216.61872900female
8AdelieBiscoe37.918.61932925female
9AdelieDream37.518.91792975missing
10AdelieDream37.016.91853000female
11AdelieDream37.316.81923000female
12AdelieTorgersen35.916.61903050female
13AdelieTorgersen35.215.91863050female
333GentooBiscoe48.616.02305800male
334GentooBiscoe48.414.62135850male
335GentooBiscoe49.315.72175850male
336GentooBiscoe55.116.02305850male
337GentooBiscoe45.216.42235950male
338GentooBiscoe49.815.92295950male
339GentooBiscoe51.116.32206000male
340GentooBiscoe48.816.22226000male
341GentooBiscoe59.617.02306050male
342GentooBiscoe49.215.22216300male
343AdelieTorgersenmissingmissingmissingmissingmissing
344GentooBiscoemissingmissingmissingmissingmissing
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#70 .cell execution_count=1}\n``` {.julia .cell-code}\nsort(penguins, my_arrange_column)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1ChinstrapDream46.916.61922700female
2AdelieBiscoe36.516.61812850female
3AdelieBiscoe36.417.11842850female
4AdelieBiscoe34.518.11872900female
5AdelieDream33.116.11782900female
6AdelieTorgersen38.617.01882900female
7ChinstrapDream43.216.61872900female
8AdelieBiscoe37.918.61932925female
9AdelieDream37.518.91792975missing
10AdelieDream37.016.91853000female
11AdelieDream37.316.81923000female
12AdelieTorgersen35.916.61903050female
13AdelieTorgersen35.215.91863050female
333GentooBiscoe48.616.02305800male
334GentooBiscoe48.414.62135850male
335GentooBiscoe49.315.72175850male
336GentooBiscoe55.116.02305850male
337GentooBiscoe45.216.42235950male
338GentooBiscoe49.815.92295950male
339GentooBiscoe51.116.32206000male
340GentooBiscoe48.816.22226000male
341GentooBiscoe59.617.02306050male
342GentooBiscoe49.215.22216300male
343AdelieTorgersenmissingmissingmissingmissingmissing
344GentooBiscoemissingmissingmissingmissingmissing
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n", + "markdown": "---\n# jupyter: julia-1.10\nengine: julia\n---\n\n\n\n\n\n# Operations on rows\n\nIn this chapter we will see operations that deal with rows, be it ordering or throwing some rows away.\n\nThe following is necessary to run all examples:\n\n\n\n\n\n::: {#2 .cell execution_count=1}\n``` {.julia .cell-code}\nusing DataFrames, PalmerPenguins\nusing Tidier\nimport DataFramesMeta as DFM\n\npenguins = PalmerPenguins.load() |> DataFrame;\n@slice_head(penguins, n = 10)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
10×7 DataFrame
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
\n```\n:::\n:::\n\n\n\n\n\n\n\n## Filtering (or: throwing rows away)\n\nTo *filter* a dataframe means keeping only the rows that satisfy a certain criteria (ie. a boolean condition).\n\nTo filter in Tidier, we use the macro `@filter`. You can use it in the form\n\n\n\n\n\n::: {#4 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter(penguins, species == \"Adelie\")\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\nor without parentesis as in \n\n\n\n\n\n::: {#6 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter penguins species == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\nNotice that the columns are typed as if they were variables on the Julia environment. This is inspired by the `tidyverse` behaviour of data-masking: inside a tidyverse verb, the columns are taken as \"statistical variables\" that exist inside the dataframe as columns.\n\nIn DataFramesMeta, we have two macros for filtering: `@subset` and `@rsubset`. Use the first when you have some criteria that uses a whole column, for example:\n\n\n\n\n\n::: {#8 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@subset penguins :body_mass_g .>= mean(skipmissing(:body_mass_g))\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
149×7 DataFrame
124 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.219.61954675male
2AdelieTorgersen42.020.21904250missing
3AdelieTorgersen34.621.11984400male
4AdelieTorgersen42.520.71974500male
5AdelieDream39.819.11844650male
6AdelieDream44.119.71964400male
7AdelieDream39.618.81904600male
8AdelieBiscoe40.118.91884300male
9AdelieBiscoe41.321.11954400male
10AdelieTorgersen41.819.41984450male
11AdelieTorgersen42.818.51954250male
12AdelieTorgersen42.917.61964700male
13AdelieDream41.118.12054300male
138GentooBiscoe47.213.72144925female
139GentooBiscoe46.814.32154850female
140GentooBiscoe50.415.72225750male
141GentooBiscoe45.214.82125200female
142GentooBiscoe49.916.12135400male
143ChinstrapDream49.218.21954400male
144ChinstrapDream52.820.02054550male
145ChinstrapDream54.220.82014300male
146ChinstrapDream52.020.72104800male
147ChinstrapDream53.519.92054500male
148ChinstrapDream50.818.52014450male
149ChinstrapDream49.019.62124300male
\n```\n:::\n:::\n\n\n\n\n\n\n\nNotice the broadcast on >=. We need it because *each variable is interpreted as a vector (the whole column)*. Also, notice that we refer to columns as _symbols_ (i.e. we append `:` to it).\n\nIn the above example, we needed the whole column `body_mass_g` to take the mean and then filter the rows based on that. If, however, your filtering criteria only uses information about each row (without needing to see it in context of the whole column), then `@rsubset` (**r**ow subset) is easier to use: it interprets each columns as a value (not an array), so no broadcasting is needed:\n\n\n\n\n\n::: {#10 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rsubset penguins :species == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\nIn both Tidier and DataFramesMeta, only the rows to which the criteria is `true` are returned. This means that `false` and `missing` are thrown away.\n\nIn pure DataFrames, we use the `subset` function, and the criteria is passed with the notation\n\n\n\n\n\n::: {#12 .cell execution_count=0}\n``` {.julia .cell-code}\nsubset(penguins, :column => boolean_function)\n\n```\n:::\n\n\n\n\n\n\n\nwhere `boolean_function` is a boolean (with possibly `missing` values) function on 1 variable (the `:column` you passed). Add the kwarg `skipmissing=true` if you want to get rid of missing values.\n\n### Filtering with one criteria\n\n**Problem:** Filtering all the rows with `species` == \"Adelie\".\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#14 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter penguins species == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#16 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rsubset penguins :species == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#18 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(penguins, :species => x -> x .== \"Adelie\", skipmissing=true)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n### Filtering with several criteria\n\n**Problem:** Filtering all the rows with `species` == \"Adelie\", `sex` == \"male\" and `body_mass_g` > 4000.\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#20 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter penguins species == \"Adelie\" sex == \"male\" body_mass_g > 4000\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
34×7 DataFrame
9 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.219.61954675male
2AdelieTorgersen34.621.11984400male
3AdelieTorgersen42.520.71974500male
4AdelieTorgersen46.021.51944200male
5AdelieDream39.221.11964150male
6AdelieDream39.819.11844650male
7AdelieDream44.119.71964400male
8AdelieDream39.618.81904600male
9AdelieDream42.321.21914150male
10AdelieBiscoe40.118.91884300male
11AdelieBiscoe42.019.52004050male
12AdelieBiscoe41.321.11954400male
13AdelieBiscoe41.118.21924050male
23AdelieDream40.318.51964350male
24AdelieDream43.218.51924100male
25AdelieBiscoe41.020.02034725male
26AdelieBiscoe37.820.01904250male
27AdelieBiscoe43.219.01974775male
28AdelieBiscoe45.620.31914600male
29AdelieBiscoe42.219.51974275male
30AdelieBiscoe42.718.31964075male
31AdelieTorgersen41.518.31954300male
32AdelieDream37.518.51994475male
33AdelieDream39.717.91934250male
34AdelieDream39.218.61904250male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#22 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rsubset penguins :species == \"Adelie\" :sex == \"male\" :body_mass_g > 4000\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
34×7 DataFrame
9 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.219.61954675male
2AdelieTorgersen34.621.11984400male
3AdelieTorgersen42.520.71974500male
4AdelieTorgersen46.021.51944200male
5AdelieDream39.221.11964150male
6AdelieDream39.819.11844650male
7AdelieDream44.119.71964400male
8AdelieDream39.618.81904600male
9AdelieDream42.321.21914150male
10AdelieBiscoe40.118.91884300male
11AdelieBiscoe42.019.52004050male
12AdelieBiscoe41.321.11954400male
13AdelieBiscoe41.118.21924050male
23AdelieDream40.318.51964350male
24AdelieDream43.218.51924100male
25AdelieBiscoe41.020.02034725male
26AdelieBiscoe37.820.01904250male
27AdelieBiscoe43.219.01974775male
28AdelieBiscoe45.620.31914600male
29AdelieBiscoe42.219.51974275male
30AdelieBiscoe42.718.31964075male
31AdelieTorgersen41.518.31954300male
32AdelieDream37.518.51994475male
33AdelieDream39.717.91934250male
34AdelieDream39.218.61904250male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#24 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(\n penguins\n , [:species, :sex, :body_mass_g] => \n (x, y, z) -> (x .== \"Adelie\") .& (y .== \"male\") .& (z .> 4000)\n ,skipmissing=true\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
34×7 DataFrame
9 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.219.61954675male
2AdelieTorgersen34.621.11984400male
3AdelieTorgersen42.520.71974500male
4AdelieTorgersen46.021.51944200male
5AdelieDream39.221.11964150male
6AdelieDream39.819.11844650male
7AdelieDream44.119.71964400male
8AdelieDream39.618.81904600male
9AdelieDream42.321.21914150male
10AdelieBiscoe40.118.91884300male
11AdelieBiscoe42.019.52004050male
12AdelieBiscoe41.321.11954400male
13AdelieBiscoe41.118.21924050male
23AdelieDream40.318.51964350male
24AdelieDream43.218.51924100male
25AdelieBiscoe41.020.02034725male
26AdelieBiscoe37.820.01904250male
27AdelieBiscoe43.219.01974775male
28AdelieBiscoe45.620.31914600male
29AdelieBiscoe42.219.51974275male
30AdelieBiscoe42.718.31964075male
31AdelieTorgersen41.518.31954300male
32AdelieDream37.518.51994475male
33AdelieDream39.717.91934250male
34AdelieDream39.218.61904250male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n**Problem:** Filtering all the rows with `species` == \"Adelie\" OR `sex` == \"male\".\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#26 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter penguins (species == \"Adelie\") | (sex == \"male\")\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
247×7 DataFrame
222 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
236ChinstrapDream50.818.52014450male
237ChinstrapDream49.019.62124300male
238ChinstrapDream51.518.71873250male
239ChinstrapDream51.419.02013950male
240ChinstrapDream50.719.72034050male
241ChinstrapDream52.218.81973450male
242ChinstrapDream49.319.92034050male
243ChinstrapDream50.218.82023800male
244ChinstrapDream51.919.52063950male
245ChinstrapDream55.819.82074000male
246ChinstrapDream49.618.21933775male
247ChinstrapDream50.819.02104100male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#28 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rsubset penguins (:species == \"Adelie\") | (:sex == \"male\")\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
247×7 DataFrame
222 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
236ChinstrapDream50.818.52014450male
237ChinstrapDream49.019.62124300male
238ChinstrapDream51.518.71873250male
239ChinstrapDream51.419.02013950male
240ChinstrapDream50.719.72034050male
241ChinstrapDream52.218.81973450male
242ChinstrapDream49.319.92034050male
243ChinstrapDream50.218.82023800male
244ChinstrapDream51.919.52063950male
245ChinstrapDream55.819.82074000male
246ChinstrapDream49.618.21933775male
247ChinstrapDream50.819.02104100male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#30 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(penguins, [:species, :sex] => (x, y) -> (x .== \"Adelie\") .| (y .== \"male\"), skipmissing=true)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
247×7 DataFrame
222 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
236ChinstrapDream50.818.52014450male
237ChinstrapDream49.019.62124300male
238ChinstrapDream51.518.71873250male
239ChinstrapDream51.419.02013950male
240ChinstrapDream50.719.72034050male
241ChinstrapDream52.218.81973450male
242ChinstrapDream49.319.92034050male
243ChinstrapDream50.218.82023800male
244ChinstrapDream51.919.52063950male
245ChinstrapDream55.819.82074000male
246ChinstrapDream49.618.21933775male
247ChinstrapDream50.819.02104100male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n### Filtering with metadata\n\nBy metadata here we mean data that is inside the dataframe, as the mean/max/min of a column.\n\n**Problem:** Filtering all the rows where the `flipper_length_mm` is greater than the mean.\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#32 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter penguins flipper_length_mm > mean(skipmissing(flipper_length_mm))\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
148×7 DataFrame
123 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieDream35.718.02023550female
2AdelieDream41.118.12054300male
3AdelieDream40.818.92084300male
4AdelieBiscoe41.020.02034725male
5AdelieTorgersen41.418.52023875male
6AdelieTorgersen44.118.02104000male
7AdelieDream41.518.52014000male
8GentooBiscoe46.113.22114500female
9GentooBiscoe50.016.32305700male
10GentooBiscoe48.714.12104450female
11GentooBiscoe50.015.22185700male
12GentooBiscoe47.614.52155400male
13GentooBiscoe46.513.52104550female
137ChinstrapDream53.519.92054500male
138ChinstrapDream49.019.52103950male
139ChinstrapDream50.818.52014450male
140ChinstrapDream49.019.62124300male
141ChinstrapDream51.419.02013950male
142ChinstrapDream50.719.72034050male
143ChinstrapDream49.319.92034050male
144ChinstrapDream50.218.82023800male
145ChinstrapDream51.919.52063950male
146ChinstrapDream55.819.82074000male
147ChinstrapDream43.518.12023400female
148ChinstrapDream50.819.02104100male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#34 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@subset penguins :flipper_length_mm .>= mean(skipmissing(:flipper_length_mm))\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
148×7 DataFrame
123 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieDream35.718.02023550female
2AdelieDream41.118.12054300male
3AdelieDream40.818.92084300male
4AdelieBiscoe41.020.02034725male
5AdelieTorgersen41.418.52023875male
6AdelieTorgersen44.118.02104000male
7AdelieDream41.518.52014000male
8GentooBiscoe46.113.22114500female
9GentooBiscoe50.016.32305700male
10GentooBiscoe48.714.12104450female
11GentooBiscoe50.015.22185700male
12GentooBiscoe47.614.52155400male
13GentooBiscoe46.513.52104550female
137ChinstrapDream53.519.92054500male
138ChinstrapDream49.019.52103950male
139ChinstrapDream50.818.52014450male
140ChinstrapDream49.019.62124300male
141ChinstrapDream51.419.02013950male
142ChinstrapDream50.719.72034050male
143ChinstrapDream49.319.92034050male
144ChinstrapDream50.218.82023800male
145ChinstrapDream51.919.52063950male
146ChinstrapDream55.819.82074000male
147ChinstrapDream43.518.12023400female
148ChinstrapDream50.819.02104100male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#36 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(penguins, :flipper_length_mm => x -> x .> mean(skipmissing(x)), skipmissing=true)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
148×7 DataFrame
123 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieDream35.718.02023550female
2AdelieDream41.118.12054300male
3AdelieDream40.818.92084300male
4AdelieBiscoe41.020.02034725male
5AdelieTorgersen41.418.52023875male
6AdelieTorgersen44.118.02104000male
7AdelieDream41.518.52014000male
8GentooBiscoe46.113.22114500female
9GentooBiscoe50.016.32305700male
10GentooBiscoe48.714.12104450female
11GentooBiscoe50.015.22185700male
12GentooBiscoe47.614.52155400male
13GentooBiscoe46.513.52104550female
137ChinstrapDream53.519.92054500male
138ChinstrapDream49.019.52103950male
139ChinstrapDream50.818.52014450male
140ChinstrapDream49.019.62124300male
141ChinstrapDream51.419.02013950male
142ChinstrapDream50.719.72034050male
143ChinstrapDream49.319.92034050male
144ChinstrapDream50.218.82023800male
145ChinstrapDream51.919.52063950male
146ChinstrapDream55.819.82074000male
147ChinstrapDream43.518.12023400female
148ChinstrapDream50.819.02104100male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n### Filtering with a variable column name\n\nSuppose the column you want to filter is a variable, let's say a symbol\n\n\n\n\n\n::: {#38 .cell execution_count=1}\n``` {.julia .cell-code}\nmy_column = :species;\n```\n:::\n\n\n\n\n\n\n\n**Problem:** Filtering all the rows where the column stored in `my_column` is \"Adelie\".\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#40 .cell execution_count=1}\n``` {.julia .cell-code}\n@eval @filter penguins $my_column == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#42 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rsubset penguins $my_column == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#44 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(penguins, my_column => x -> x .== \"Adelie\")\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\nIn case the column is a string\n\n\n\n\n\n::: {#46 .cell execution_count=1}\n``` {.julia .cell-code}\nmy_column_string = \"species\";\n```\n:::\n\n\n\n\n\n\n\ninstead of a symbol, we can write in the same way, just taking care in Tidier to convert it to a symbol\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#48 .cell execution_count=1}\n``` {.julia .cell-code}\n@eval @filter penguins $(Symbol(my_column_string)) == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#50 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rsubset penguins $(my_column_string) == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#52 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(penguins, my_column_string => x -> x .== \"Adelie\")\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
141AdelieDream40.217.11933400female
142AdelieDream40.617.21873475male
143AdelieDream32.115.51883050female
144AdelieDream40.717.01903725male
145AdelieDream37.316.81923000female
146AdelieDream39.018.71853650male
147AdelieDream39.218.61904250male
148AdelieDream36.618.41843475female
149AdelieDream36.017.81953450female
150AdelieDream37.818.11933750male
151AdelieDream36.017.11873700female
152AdelieDream41.518.52014000male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n## Arranging\n\nTo *arrange* a dataframe means to reorder the rows according to the order of some columns. The rows are first arranged by the first column, then by the second (if any), and so on. In Tidier, when we want to invert the ordering, just put the column name inside a `desc()` call.\n\n### Arranging by one column\n\n**Problem:** Arrange by `body_mass_g`.\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#54 .cell execution_count=1}\n``` {.julia .cell-code}\n@arrange penguins body_mass_g\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1ChinstrapDream46.916.61922700female
2AdelieBiscoe36.516.61812850female
3AdelieBiscoe36.417.11842850female
4AdelieBiscoe34.518.11872900female
5AdelieDream33.116.11782900female
6AdelieTorgersen38.617.01882900female
7ChinstrapDream43.216.61872900female
8AdelieBiscoe37.918.61932925female
9AdelieDream37.518.91792975missing
10AdelieDream37.016.91853000female
11AdelieDream37.316.81923000female
12AdelieTorgersen35.916.61903050female
13AdelieTorgersen35.215.91863050female
333GentooBiscoe48.616.02305800male
334GentooBiscoe48.414.62135850male
335GentooBiscoe49.315.72175850male
336GentooBiscoe55.116.02305850male
337GentooBiscoe45.216.42235950male
338GentooBiscoe49.815.92295950male
339GentooBiscoe51.116.32206000male
340GentooBiscoe48.816.22226000male
341GentooBiscoe59.617.02306050male
342GentooBiscoe49.215.22216300male
343AdelieTorgersenmissingmissingmissingmissingmissing
344GentooBiscoemissingmissingmissingmissingmissing
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#56 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@orderby penguins :body_mass_g\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1ChinstrapDream46.916.61922700female
2AdelieBiscoe36.516.61812850female
3AdelieBiscoe36.417.11842850female
4AdelieBiscoe34.518.11872900female
5AdelieDream33.116.11782900female
6AdelieTorgersen38.617.01882900female
7ChinstrapDream43.216.61872900female
8AdelieBiscoe37.918.61932925female
9AdelieDream37.518.91792975missing
10AdelieDream37.016.91853000female
11AdelieDream37.316.81923000female
12AdelieTorgersen35.916.61903050female
13AdelieTorgersen35.215.91863050female
333GentooBiscoe48.616.02305800male
334GentooBiscoe48.414.62135850male
335GentooBiscoe49.315.72175850male
336GentooBiscoe55.116.02305850male
337GentooBiscoe45.216.42235950male
338GentooBiscoe49.815.92295950male
339GentooBiscoe51.116.32206000male
340GentooBiscoe48.816.22226000male
341GentooBiscoe59.617.02306050male
342GentooBiscoe49.215.22216300male
343AdelieTorgersenmissingmissingmissingmissingmissing
344GentooBiscoemissingmissingmissingmissingmissing
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#58 .cell execution_count=1}\n``` {.julia .cell-code}\nsort(penguins, :body_mass_g)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1ChinstrapDream46.916.61922700female
2AdelieBiscoe36.516.61812850female
3AdelieBiscoe36.417.11842850female
4AdelieBiscoe34.518.11872900female
5AdelieDream33.116.11782900female
6AdelieTorgersen38.617.01882900female
7ChinstrapDream43.216.61872900female
8AdelieBiscoe37.918.61932925female
9AdelieDream37.518.91792975missing
10AdelieDream37.016.91853000female
11AdelieDream37.316.81923000female
12AdelieTorgersen35.916.61903050female
13AdelieTorgersen35.215.91863050female
333GentooBiscoe48.616.02305800male
334GentooBiscoe48.414.62135850male
335GentooBiscoe49.315.72175850male
336GentooBiscoe55.116.02305850male
337GentooBiscoe45.216.42235950male
338GentooBiscoe49.815.92295950male
339GentooBiscoe51.116.32206000male
340GentooBiscoe48.816.22226000male
341GentooBiscoe59.617.02306050male
342GentooBiscoe49.215.22216300male
343AdelieTorgersenmissingmissingmissingmissingmissing
344GentooBiscoemissingmissingmissingmissingmissing
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n### Arranging by two columns, with one reversed\n\n**Problem:** First arrange by `island`, then by reversed `body_mass_g`.\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#60 .cell execution_count=1}\n``` {.julia .cell-code}\n@arrange penguins island desc(body_mass_g)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1GentooBiscoemissingmissingmissingmissingmissing
2GentooBiscoe49.215.22216300male
3GentooBiscoe59.617.02306050male
4GentooBiscoe51.116.32206000male
5GentooBiscoe48.816.22226000male
6GentooBiscoe45.216.42235950male
7GentooBiscoe49.815.92295950male
8GentooBiscoe48.414.62135850male
9GentooBiscoe49.315.72175850male
10GentooBiscoe55.116.02305850male
11GentooBiscoe49.516.22295800male
12GentooBiscoe48.616.02305800male
13GentooBiscoe50.415.72225750male
333AdelieTorgersen41.118.61893325male
334AdelieTorgersen38.517.91903325female
335AdelieTorgersen37.817.11863300missing
336AdelieTorgersen38.817.61913275female
337AdelieTorgersen40.318.01953250female
338AdelieTorgersen41.117.61823200female
339AdelieTorgersen34.617.21893200female
340AdelieTorgersen36.217.21873150female
341AdelieTorgersen35.916.61903050female
342AdelieTorgersen35.215.91863050female
343AdelieTorgersen39.017.11913050female
344AdelieTorgersen38.617.01882900female
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#62 .cell execution_count=1}\n``` {.julia .cell-code}\n# works only when the reversed column is numeric?\n\nDFM.@orderby penguins :island :body_mass_g .* -1\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1GentooBiscoe49.215.22216300male
2GentooBiscoe59.617.02306050male
3GentooBiscoe51.116.32206000male
4GentooBiscoe48.816.22226000male
5GentooBiscoe45.216.42235950male
6GentooBiscoe49.815.92295950male
7GentooBiscoe48.414.62135850male
8GentooBiscoe49.315.72175850male
9GentooBiscoe55.116.02305850male
10GentooBiscoe49.516.22295800male
11GentooBiscoe48.616.02305800male
12GentooBiscoe50.415.72225750male
13GentooBiscoe50.016.32305700male
333AdelieTorgersen38.517.91903325female
334AdelieTorgersen37.817.11863300missing
335AdelieTorgersen38.817.61913275female
336AdelieTorgersen40.318.01953250female
337AdelieTorgersen41.117.61823200female
338AdelieTorgersen34.617.21893200female
339AdelieTorgersen36.217.21873150female
340AdelieTorgersen35.916.61903050female
341AdelieTorgersen35.215.91863050female
342AdelieTorgersen39.017.11913050female
343AdelieTorgersen38.617.01882900female
344AdelieTorgersenmissingmissingmissingmissingmissing
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#64 .cell execution_count=1}\n``` {.julia .cell-code}\nsort(penguins, [order(:island), order(:body_mass_g, rev=true)])\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1GentooBiscoemissingmissingmissingmissingmissing
2GentooBiscoe49.215.22216300male
3GentooBiscoe59.617.02306050male
4GentooBiscoe51.116.32206000male
5GentooBiscoe48.816.22226000male
6GentooBiscoe45.216.42235950male
7GentooBiscoe49.815.92295950male
8GentooBiscoe48.414.62135850male
9GentooBiscoe49.315.72175850male
10GentooBiscoe55.116.02305850male
11GentooBiscoe49.516.22295800male
12GentooBiscoe48.616.02305800male
13GentooBiscoe50.415.72225750male
333AdelieTorgersen41.118.61893325male
334AdelieTorgersen38.517.91903325female
335AdelieTorgersen37.817.11863300missing
336AdelieTorgersen38.817.61913275female
337AdelieTorgersen40.318.01953250female
338AdelieTorgersen41.117.61823200female
339AdelieTorgersen34.617.21893200female
340AdelieTorgersen36.217.21873150female
341AdelieTorgersen35.916.61903050female
342AdelieTorgersen35.215.91863050female
343AdelieTorgersen39.017.11913050female
344AdelieTorgersen38.617.01882900female
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n### Arranging by one variable column\n\n**Problem:** Arrange by a column stored in a variable `my_arrange_column`.\n\n\n\n\n\n::: {#66 .cell execution_count=1}\n``` {.julia .cell-code}\nmy_arrange_column = :body_mass_g;\n```\n:::\n\n\n\n\n\n\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#68 .cell execution_count=1}\n``` {.julia .cell-code}\n@eval @arrange penguins $my_arrange_column\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1ChinstrapDream46.916.61922700female
2AdelieBiscoe36.516.61812850female
3AdelieBiscoe36.417.11842850female
4AdelieBiscoe34.518.11872900female
5AdelieDream33.116.11782900female
6AdelieTorgersen38.617.01882900female
7ChinstrapDream43.216.61872900female
8AdelieBiscoe37.918.61932925female
9AdelieDream37.518.91792975missing
10AdelieDream37.016.91853000female
11AdelieDream37.316.81923000female
12AdelieTorgersen35.916.61903050female
13AdelieTorgersen35.215.91863050female
333GentooBiscoe48.616.02305800male
334GentooBiscoe48.414.62135850male
335GentooBiscoe49.315.72175850male
336GentooBiscoe55.116.02305850male
337GentooBiscoe45.216.42235950male
338GentooBiscoe49.815.92295950male
339GentooBiscoe51.116.32206000male
340GentooBiscoe48.816.22226000male
341GentooBiscoe59.617.02306050male
342GentooBiscoe49.215.22216300male
343AdelieTorgersenmissingmissingmissingmissingmissing
344GentooBiscoemissingmissingmissingmissingmissing
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#70 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@orderby penguins $my_arrange_column\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1ChinstrapDream46.916.61922700female
2AdelieBiscoe36.516.61812850female
3AdelieBiscoe36.417.11842850female
4AdelieBiscoe34.518.11872900female
5AdelieDream33.116.11782900female
6AdelieTorgersen38.617.01882900female
7ChinstrapDream43.216.61872900female
8AdelieBiscoe37.918.61932925female
9AdelieDream37.518.91792975missing
10AdelieDream37.016.91853000female
11AdelieDream37.316.81923000female
12AdelieTorgersen35.916.61903050female
13AdelieTorgersen35.215.91863050female
333GentooBiscoe48.616.02305800male
334GentooBiscoe48.414.62135850male
335GentooBiscoe49.315.72175850male
336GentooBiscoe55.116.02305850male
337GentooBiscoe45.216.42235950male
338GentooBiscoe49.815.92295950male
339GentooBiscoe51.116.32206000male
340GentooBiscoe48.816.22226000male
341GentooBiscoe59.617.02306050male
342GentooBiscoe49.215.22216300male
343AdelieTorgersenmissingmissingmissingmissingmissing
344GentooBiscoemissingmissingmissingmissingmissing
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#72 .cell execution_count=1}\n``` {.julia .cell-code}\nsort(penguins, my_arrange_column)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1ChinstrapDream46.916.61922700female
2AdelieBiscoe36.516.61812850female
3AdelieBiscoe36.417.11842850female
4AdelieBiscoe34.518.11872900female
5AdelieDream33.116.11782900female
6AdelieTorgersen38.617.01882900female
7ChinstrapDream43.216.61872900female
8AdelieBiscoe37.918.61932925female
9AdelieDream37.518.91792975missing
10AdelieDream37.016.91853000female
11AdelieDream37.316.81923000female
12AdelieTorgersen35.916.61903050female
13AdelieTorgersen35.215.91863050female
333GentooBiscoe48.616.02305800male
334GentooBiscoe48.414.62135850male
335GentooBiscoe49.315.72175850male
336GentooBiscoe55.116.02305850male
337GentooBiscoe45.216.42235950male
338GentooBiscoe49.815.92295950male
339GentooBiscoe51.116.32206000male
340GentooBiscoe48.816.22226000male
341GentooBiscoe59.617.02306050male
342GentooBiscoe49.215.22216300male
343AdelieTorgersenmissingmissingmissingmissingmissing
344GentooBiscoemissingmissingmissingmissingmissing
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n", "supporting": [ "dataframes-rows_files" ], diff --git a/_freeze/dataframes/execute-results/html.json b/_freeze/dataframes/execute-results/html.json index 10da5c1..891469a 100644 --- a/_freeze/dataframes/execute-results/html.json +++ b/_freeze/dataframes/execute-results/html.json @@ -1,10 +1,10 @@ { - "hash": "bf3bf9fbea01582cf3465d388dc8f6aa", + "hash": "1c99b94b83399a24e7a5d3f101a0a8b5", "result": { "engine": "julia", - "markdown": "---\n# jupyter: julia-1.10\nengine: julia\n---\n\n\n\n\n\n\n\n\n\n# Part 2: Dataframes\n\nDataframes are one of the most important objects in data science. \n\nA dataframe is a table where each row is an observation and each column is a variable.\n\n::: {.callout}\nA dataframe is a list of vectors all with the same length. \n:::\n\nWe will use the Palmer Penguin dataset as a toy example for the remaining of the chapter.\n\n\n\n\n\n\n\n\n\n::: {#2 .cell execution_count=1}\n``` {.julia .cell-code}\nusing DataFrames, PalmerPenguins\nusing Tidier, Chain\nimport DataFramesMeta as DFM\n\npenguins = PalmerPenguins.load() |> DataFrame\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
333ChinstrapDream45.216.61913250female
334ChinstrapDream49.319.92034050male
335ChinstrapDream50.218.82023800male
336ChinstrapDream45.619.41943525female
337ChinstrapDream51.919.52063950male
338ChinstrapDream46.816.51893650female
339ChinstrapDream45.717.01953650female
340ChinstrapDream55.819.82074000male
341ChinstrapDream43.518.12023400female
342ChinstrapDream49.618.21933775male
343ChinstrapDream50.819.02104100male
344ChinstrapDream50.218.71983775female
\n```\n:::\n:::\n\n\n\n\n\n\n\n\n\n\n\n::: {.callout-note}\n\n`Dataframes.jl` is the main package for dealing with dataframes in Julia. You can use it directly to manipulate tables, but we also have 2 alternatives: DataFramesMeta and Tidier. \n\nDataFramesMeta is a collection of macros based on DataFrames.\n\nTidier is inspired by the `tidyverse` ecosystem in R. Tidier use macros to rewrite your code into DataFrames.jl code. Because of this \"tidy\" heritance, we will often talk about the R packages that inspired the Julia ones (like `dplyr`, `tidyr` and many others).\n\nIn this book, whenever possible, we will show the different approaches in a tabset so you can compare them, giving more emphasis on Tidier.\n:::\n\n## Operations\n\nLet's start with some unary operations, ie. operations that take only one dataframe as input and return one dataframe as output.^[Join operations will be dealt later.]. We can divide these operations in some categories:\n\n### Rows operations\n\nThese are operations that only affect rows, leaving all columns untouched.\n\n- *Filtering* or *subsetting* is when we select a subset of rows based on some criteria. Example: all male penguins of species Adelie. The output is a dataframe with the exact same columns, but possibly fewer rows.\n\n- *Arranging* or *ordering* is when we reorder the rows of a dataframe using some criteria.\n\n### Column operations\n\nThese are operations that only affect columns, leaving all rows untouched.\n\n- *Selecting* is when we select some columns of a dataframe, while keeping all the rows. Example: select the `species` and `sex` columns.\n\n- *Mutating* or *transforming* is when we create new columns. Example: a new column `body_mass_kg` can be obtained dividing the column `body_mass_g` by 1000.\n\n### Reshaping operations\n\nThese operations change the shape of a dataframe, making it wider or longer.\n\n- `Widening`\n\n- `Longering`?\n\n### Grouping operations\n\n- *Grouping* is when we split the dataframe into a collection (array) of dataframes using some criteria. Example: grouping by `species` gives us 3 dataframes, each with only one species.\n\n### Mixed operations\n\nThese operations can possibly change rows and columns at the same time.\n\n- Distinct;\n- Counting;\n- *Summarising* or *combining* is when we apply some function to some columns in order to reduce the amount of rows with some kind of summary (like a mean, median, max, and so on). Example: for each `species`, apply the `mean` function to the columns `body_mass_g`. This will yield a dataframe with 3 rows, one for each species. Summarising is usually done after a grouping, so the summary is calculated with relation to each of the groups.\n\n??? deixar grupo e sumário juntos?\n\nSince all these functions return a dataframe (or an array of dataframes, in the case of grouping), we can chain these operations together, with the convention that on grouped dataframes we apply the function in each one of the groups.\n\nNow for binary operations (ie. operations that take two dataframes), we have all the joins:\n\n- Left join;\n- Right join;\n- Inner join;\n- Outer join;\n- Full join.\n\n## Comparing Tidier with DataFramesMeta\n\nThe following table list the operations on each package:\n\n| dplyr | Tidier | DataFramesMeta | DataFrames |\n|-------------|--------------|------------------------------|--------------|\n| `filter` | `@filter` | `@subset` / `@rsubset` | `subset` |\n| `arrange` | `@arrange` | `@orderby` / `@rorderby` | `sort!` |\n| `select` | `@select` | `@select` | array sintax |\n| `mutate` | `@mutate` | `@transform` / `@rtransform` | array sintax |\n| `group_by` | `@group_by` | `@groupby` | `groupby` |\n| `summarise` | `@summarise` | `@combine` | `combine` |\n\nIt is clear that for those coming from `R`, Tidier will look like the most natural approach.\n\nNotice that we have a name clash with `@select`: that is why we `import DataFramesMeta as DFM` at the beginning.\n\nWe will see each operation with more details in the following chapters.\n\n## Chaining operations\n\nWe can chain (or pipe) dataframe operations as follows with the `@chain` macro:\n\n\n\n\n\n\n\n\n\n::: {#4 .cell execution_count=0}\n``` {.julia .cell-code}\n@chain penguins begin\n @filter !ismissing(sex)\n @group_by sex\n @summarise mean = mean(bill_length_mm)\n @arrange mean\nend\n```\n:::\n\n\n", + "markdown": "---\n# jupyter: julia-1.10\nengine: julia\n---\n\n\n\n\n\n# Part 2: Dataframes\n\nDataframes are one of the most important objects in data science. \n\nA dataframe is a table where each row is an observation and each column is a variable.\n\n::: {.callout}\nA dataframe `df` is a list of vectors, all with the same length.\n\nA column of `df` is just one if its vectors.\n\nThe `i-th` row of `df` is the vector formed by the `i-th` coordinate of each of its columns.\n:::\n\nWe will use the Palmer Penguin dataset as a toy example for the remaining of the chapter.\n\n\n\n\n\n::: {#2 .cell execution_count=1}\n``` {.julia .cell-code}\nusing DataFrames, PalmerPenguins\nusing Tidier, Chain\nimport DataFramesMeta as DFM\n\npenguins = PalmerPenguins.load() |> DataFrame\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Rowspeciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
String15String15Float64?Float64?Int64?Int64?String7?
1AdelieTorgersen39.118.71813750male
2AdelieTorgersen39.517.41863800female
3AdelieTorgersen40.318.01953250female
4AdelieTorgersenmissingmissingmissingmissingmissing
5AdelieTorgersen36.719.31933450female
6AdelieTorgersen39.320.61903650male
7AdelieTorgersen38.917.81813625female
8AdelieTorgersen39.219.61954675male
9AdelieTorgersen34.118.11933475missing
10AdelieTorgersen42.020.21904250missing
11AdelieTorgersen37.817.11863300missing
12AdelieTorgersen37.817.31803700missing
13AdelieTorgersen41.117.61823200female
333ChinstrapDream45.216.61913250female
334ChinstrapDream49.319.92034050male
335ChinstrapDream50.218.82023800male
336ChinstrapDream45.619.41943525female
337ChinstrapDream51.919.52063950male
338ChinstrapDream46.816.51893650female
339ChinstrapDream45.717.01953650female
340ChinstrapDream55.819.82074000male
341ChinstrapDream43.518.12023400female
342ChinstrapDream49.618.21933775male
343ChinstrapDream50.819.02104100male
344ChinstrapDream50.218.71983775female
\n```\n:::\n:::\n\n\n\n\n\n\n\n## Libraries\n\n### Dataframes\n\n`Dataframes.jl` is the main package for dealing with dataframes in Julia. You can use it directly to manipulate tables, but we also have 2 alternatives: DataFramesMeta and Tidier. \n\n### DataFramesMeta\n\nDataFramesMeta is a collection of macros based on DataFrames. It provides many syntatic helpers to slice rows, create columns and summarise data.\n\n### Tidier\n\nTidier is inspired by the `tidyverse` ecosystem in R. Tidier use macros to rewrite your code into DataFrames.jl code. Because of this \"tidy\" heritance, we will often talk about the R packages that inspired the Julia ones (like `dplyr`, `tidyr` and many others).\n\nIn this book, whenever possible, we will show the different approaches in a tabset so you can compare them, giving more emphasis on Tidier.\n\n## Operations\n\nLet's start with some unary operations, ie. operations that take only one dataframe as input and return one dataframe as output.^[Join operations will be dealt later.]. We can divide these operations in some categories:\n\n### Rows operations\n\nThese are operations that only affect rows, leaving all columns untouched.\n\n- *Filtering* or *subsetting* is when we select a subset of rows based on some criteria. Example: all male penguins of species Adelie. The output is a dataframe with the exact same columns, but possibly fewer rows.\n\n- *Arranging* or *ordering* is when we reorder the rows of a dataframe using some criteria.\n\n### Column operations\n\nThese are operations that only affect columns, leaving all rows untouched.\n\n- *Selecting* is when we select some columns of a dataframe, while keeping all the rows. Example: select the `species` and `sex` columns.\n\n- *Mutating* or *transforming* is when we create new columns. Example: a new column `body_mass_kg` can be obtained dividing the column `body_mass_g` by 1000.\n\n### Reshaping operations\n\nThese operations change the shape of a dataframe, making it wider or longer.\n\n- `Widening`\n\n- `Longering`?\n\n### Grouping operations\n\n- *Grouping* is when we split the dataframe into a collection (array) of dataframes using some criteria. Example: grouping by `species` gives us 3 dataframes, each with only one species.\n\n### Mixed operations\n\nThese operations can possibly change rows and columns at the same time.\n\n- Distinct;\n- Counting;\n- *Summarising* or *combining* is when we apply some function to some columns in order to reduce the amount of rows with some kind of summary (like a mean, median, max, and so on). Example: for each `species`, apply the `mean` function to the columns `body_mass_g`. This will yield a dataframe with 3 rows, one for each species. Summarising is usually done after a grouping, so the summary is calculated with relation to each of the groups.\n\n??? deixar grupo e sumário juntos?\n\nSince all these functions return a dataframe (or an array of dataframes, in the case of grouping), we can chain these operations together, with the convention that on grouped dataframes we apply the function in each one of the groups.\n\nNow for binary operations (ie. operations that take two dataframes), we have all the joins:\n\n- Left join;\n- Right join;\n- Inner join;\n- Outer join;\n- Full join.\n\n## Comparing Tidier with DataFramesMeta\n\nThe following table list the operations on each package:\n\n| dplyr | Tidier | DataFramesMeta | DataFrames |\n|-------------|--------------|------------------------------|--------------|\n| `filter` | `@filter` | `@subset` / `@rsubset` | `subset` |\n| `arrange` | `@arrange` | `@orderby` / `@rorderby` | `sort!` |\n| `select` | `@select` | `@select` | array sintax |\n| `mutate` | `@mutate` | `@transform` / `@rtransform` | array sintax |\n| `group_by` | `@group_by` | `@groupby` | `groupby` |\n| `summarise` | `@summarise` | `@combine` | `combine` |\n\nIt is clear that for those coming from `R`, Tidier will look like the most natural approach.\n\nNotice that we have a name clash with `@select`: that is why we `import DataFramesMeta as DFM` at the beginning.\n\nWe will see each operation with more details in the following chapters.\n\n## Chaining operations\n\nWe can chain (or pipe) dataframe operations as follows with the `@chain` macro:\n\n\n\n\n\n::: {#4 .cell execution_count=0}\n``` {.julia .cell-code}\n@chain penguins begin\n @filter !ismissing(sex)\n @group_by sex\n @summarise mean = mean(bill_length_mm)\n @arrange mean\nend\n```\n:::\n\n\n\n\n\n\n\n## Using variables as column names\n\nIn Tidier, using the column names as if they were variables in the environment leads to some complication when we want to use other variables that are not column names.\n\nFor example, suppose you want to arrange penguins by a column that is stored in a variable.\n\nWhen this happens, we add `@eval` before the Tidier code and add a `$` to force evaluation of the variable, as in the following example:\n\n\n\n\n\n::: {#6 .cell execution_count=0}\n``` {.julia .cell-code}\nmy_arrange_column = :body_mass_g;\n\n@eval @arrange penguins $my_arrange_column\n```\n:::\n\n\n\n\n\n\n\n\n## Documentation\n\nhttps://dataframes.juliadata.org/stable/man/working_with_dataframes/\n\nhttps://juliadata.org/DataFramesMeta.jl/stable\n\nhttps://tidierorg.github.io/TidierData.jl/latest/reference/\n\n", "supporting": [ - "dataframes_files/figure-html" + "dataframes_files" ], "filters": [], "includes": { diff --git a/dataframes-columns.qmd b/dataframes-columns.qmd index d24633d..9111d99 100644 --- a/dataframes-columns.qmd +++ b/dataframes-columns.qmd @@ -18,6 +18,8 @@ penguins = PalmerPenguins.load() |> DataFrame; ### Selecting `n` columns +**Problem:** Select only some columns. + ::: {.panel-tabset} ## Tidier @@ -42,6 +44,8 @@ DFM.select(penguins, [:species, :body_mass_g]) ### Selecting columns from a variable +**Problem:** Select only some columns whose names are stored in a variable. + ::: {.panel-tabset} ```{julia} @@ -51,7 +55,7 @@ my_columns = [:species, :body_mass_g]; ## Tidier ```{julia} -@select penguins !!my_columns +@eval @select penguins $my_columns... ``` ## DataFramesMeta @@ -72,7 +76,7 @@ DFM.select(penguins, my_columns) ### Creating one column based on another one -Create the column `body_mass_kg` by dividing `body_mass_g` by 1000. +**Problem:** Create the column `body_mass_kg` by dividing `body_mass_g` by 1000. ::: {.panel-tabset} diff --git a/dataframes-rows.qmd b/dataframes-rows.qmd index 2040b44..6048bb3 100644 --- a/dataframes-rows.qmd +++ b/dataframes-rows.qmd @@ -5,6 +5,10 @@ engine: julia # Operations on rows +In this chapter we will see operations that deal with rows, be it ordering or throwing some rows away. + +The following is necessary to run all examples: + ```{julia} using DataFrames, PalmerPenguins using Tidier @@ -14,11 +18,11 @@ penguins = PalmerPenguins.load() |> DataFrame; @slice_head(penguins, n = 10) ``` -## Filtering (or: throwing lines away) +## Filtering (or: throwing rows away) -To filter a dataframe means keeping only the rows that satisfy a certain criteria (ie. a boolean condition). +To *filter* a dataframe means keeping only the rows that satisfy a certain criteria (ie. a boolean condition). -To filter a dataframe in Tidier, we use the macro `@filter`. You can use it in the form +To filter in Tidier, we use the macro `@filter`. You can use it in the form ```{julia} @filter(penguins, species == "Adelie") @@ -40,7 +44,7 @@ DFM.@subset penguins :body_mass_g .>= mean(skipmissing(:body_mass_g)) Notice the broadcast on >=. We need it because *each variable is interpreted as a vector (the whole column)*. Also, notice that we refer to columns as _symbols_ (i.e. we append `:` to it). -In the above example, we needed the whole column `body_mass_g` to take the mean and then filter the rows based on that. If, however, your filtering criteria only uses information about each row (without needing to see it in context of the whole column), then `@rsubset` (row subset) is easier to use: it interprets each columns as a value (not an array), so no broadcasting is needed: +In the above example, we needed the whole column `body_mass_g` to take the mean and then filter the rows based on that. If, however, your filtering criteria only uses information about each row (without needing to see it in context of the whole column), then `@rsubset` (**r**ow subset) is easier to use: it interprets each columns as a value (not an array), so no broadcasting is needed: ```{julia} DFM.@rsubset penguins :species == "Adelie" @@ -57,11 +61,11 @@ subset(penguins, :column => boolean_function) ``` -where `boolean_function` is a boolean (with possibly `missing` values) function on 1 variable. Add the kwarg `skipmissing=true` if you want to get rid of missing values. +where `boolean_function` is a boolean (with possibly `missing` values) function on 1 variable (the `:column` you passed). Add the kwarg `skipmissing=true` if you want to get rid of missing values. ### Filtering with one criteria -Filtering all the rows with `species` == "Adelie". +**Problem:** Filtering all the rows with `species` == "Adelie". ::: {.panel-tabset} @@ -87,7 +91,7 @@ subset(penguins, :species => x -> x .== "Adelie", skipmissing=true) ### Filtering with several criteria -Filtering all the rows with `species` == "Adelie", `sex` == "male" and `body_mass_g` > 4000. +**Problem:** Filtering all the rows with `species` == "Adelie", `sex` == "male" and `body_mass_g` > 4000. ::: {.panel-tabset} @@ -116,8 +120,7 @@ subset( ::: - -Filtering all the rows with `species` == "Adelie" OR `sex` == "male". +**Problem:** Filtering all the rows with `species` == "Adelie" OR `sex` == "male". ::: {.panel-tabset} @@ -141,8 +144,11 @@ subset(penguins, [:species, :sex] => (x, y) -> (x .== "Adelie") .| (y .== "male" ::: +### Filtering with metadata -Filtering all the rows where the `flipper_length_mm` is greater than the mean. +By metadata here we mean data that is inside the dataframe, as the mean/max/min of a column. + +**Problem:** Filtering all the rows where the `flipper_length_mm` is greater than the mean. ::: {.panel-tabset} @@ -168,14 +174,22 @@ subset(penguins, :flipper_length_mm => x -> x .> mean(skipmissing(x)), skipmissi ### Filtering with a variable column name -Suppose the column you want to filter is a variable, let's say +Suppose the column you want to filter is a variable, let's say a symbol ```{julia} my_column = :species; ``` +**Problem:** Filtering all the rows where the column stored in `my_column` is "Adelie". + ::: {.panel-tabset} +## Tidier + +```{julia} +@eval @filter penguins $my_column == "Adelie" +``` + ## DataFramesMeta ```{julia} @@ -196,16 +210,17 @@ In case the column is a string my_column_string = "species"; ``` -instead of a symbol, we can write in the same way +instead of a symbol, we can write in the same way, just taking care in Tidier to convert it to a symbol ::: {.panel-tabset} ## Tidier ```{julia} -# @filter(penguins, !!my_column == "Adelie") +@eval @filter penguins $(Symbol(my_column_string)) == "Adelie" ``` + ## DataFramesMeta ```{julia} @@ -222,11 +237,11 @@ subset(penguins, my_column_string => x -> x .== "Adelie") ## Arranging -Arranging is when we reorder the rows of a dataframe according to some columns. The rows are first arranged by the first column, then by the second (if any), and so on. In Tidier, when we want to invert the ordering, just put the column name inside a `desc()` call. +To *arrange* a dataframe means to reorder the rows according to the order of some columns. The rows are first arranged by the first column, then by the second (if any), and so on. In Tidier, when we want to invert the ordering, just put the column name inside a `desc()` call. ### Arranging by one column -Arrange by `body_mass_g`. +**Problem:** Arrange by `body_mass_g`. ::: {.panel-tabset} @@ -252,7 +267,7 @@ sort(penguins, :body_mass_g) ### Arranging by two columns, with one reversed -First arrange by `island`, then by reversed `body_mass_g`. +**Problem:** First arrange by `island`, then by reversed `body_mass_g`. ::: {.panel-tabset} @@ -280,7 +295,7 @@ sort(penguins, [order(:island), order(:body_mass_g, rev=true)]) ### Arranging by one variable column -Let's arrange the data by the following column: +**Problem:** Arrange by a column stored in a variable `my_arrange_column`. ```{julia} my_arrange_column = :body_mass_g; @@ -291,8 +306,7 @@ my_arrange_column = :body_mass_g; ## Tidier ```{julia} -#?? how to do it? -# @arrange penguins !!my_arrange_column +@eval @arrange penguins $my_arrange_column ``` ## DataFramesMeta diff --git a/dataframes-rows.quarto_ipynb b/dataframes-rows.quarto_ipynb new file mode 100644 index 0000000..488b518 --- /dev/null +++ b/dataframes-rows.quarto_ipynb @@ -0,0 +1,772 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "jupyter: julia-1.10\n", + "# engine: julia\n", + "---\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "# Operations on rows\n" + ], + "id": "ea5be365" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "using DataFrames, PalmerPenguins\n", + "using Tidier\n", + "import DataFramesMeta as DFM\n", + "\n", + "penguins = PalmerPenguins.load() |> DataFrame;\n", + "@slice_head(penguins, n = 10)" + ], + "id": "4e698924", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Filtering (or: throwing lines away)\n", + "\n", + "To filter a dataframe means keeping only the rows that satisfy a certain criteria (ie. a boolean condition).\n", + "\n", + "To filter a dataframe in Tidier, we use the macro `@filter`. You can use it in the form\n" + ], + "id": "8abe75b9" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "@filter(penguins, species == \"Adelie\")" + ], + "id": "861ae2cd", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "or without parentesis as in \n" + ], + "id": "9c978b03" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "@filter penguins species == \"Adelie\"" + ], + "id": "5fc51708", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice that the columns are typed as if they were variables on the Julia environment. This is inspired by the `tidyverse` behaviour of data-masking: inside a tidyverse verb, the columns are taken as \"statistical variables\" that exist inside the dataframe as columns.\n", + "\n", + "In DataFramesMeta, we have two macros for filtering: `@subset` and `@rsubset`. Use the first when you have some criteria that uses a whole column, for example:\n" + ], + "id": "c749e8e7" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "DFM.@subset penguins :body_mass_g .>= mean(skipmissing(:body_mass_g))" + ], + "id": "5674e7ca", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice the broadcast on >=. We need it because *each variable is interpreted as a vector (the whole column)*. Also, notice that we refer to columns as _symbols_ (i.e. we append `:` to it).\n", + "\n", + "In the above example, we needed the whole column `body_mass_g` to take the mean and then filter the rows based on that. If, however, your filtering criteria only uses information about each row (without needing to see it in context of the whole column), then `@rsubset` (row subset) is easier to use: it interprets each columns as a value (not an array), so no broadcasting is needed:\n" + ], + "id": "650f2341" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "DFM.@rsubset penguins :species == \"Adelie\"" + ], + "id": "165e3e30", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In both Tidier and DataFramesMeta, only the rows to which the criteria is `true` are returned. This means that `false` and `missing` are thrown away.\n", + "\n", + "In pure DataFrames, we use the `subset` function, and the criteria is passed with the notation\n" + ], + "id": "c19e21d8" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "#| eval: false\n", + "\n", + "subset(penguins, :column => boolean_function)" + ], + "id": "e52816cb", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "where `boolean_function` is a boolean (with possibly `missing` values) function on 1 variable (the `:column` you passed). Add the kwarg `skipmissing=true` if you want to get rid of missing values.\n", + "\n", + "### Filtering with one criteria\n", + "\n", + "Filtering all the rows with `species` == \"Adelie\".\n", + "\n", + "::: {.panel-tabset}\n", + "\n", + "## Tidier\n" + ], + "id": "ebcd6346" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "@filter penguins species == \"Adelie\"" + ], + "id": "7fb1666d", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFramesMeta\n" + ], + "id": "e8f686ea" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "DFM.@rsubset penguins :species == \"Adelie\"" + ], + "id": "95c17061", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFrames\n" + ], + "id": "fa2c5547" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "subset(penguins, :species => x -> x .== \"Adelie\", skipmissing=true)" + ], + "id": "6fb2812e", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + ":::\n", + "\n", + "### Filtering with several criteria\n", + "\n", + "Filtering all the rows with `species` == \"Adelie\", `sex` == \"male\" and `body_mass_g` > 4000.\n", + "\n", + "::: {.panel-tabset}\n", + "\n", + "## Tidier\n" + ], + "id": "09049eb9" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "@filter penguins species == \"Adelie\" sex == \"male\" body_mass_g > 4000" + ], + "id": "11d29a51", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFramesMeta\n" + ], + "id": "0ce455c1" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "DFM.@rsubset penguins :species == \"Adelie\" :sex == \"male\" :body_mass_g > 4000" + ], + "id": "cb5749ba", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFrames\n" + ], + "id": "df8f2354" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "subset(\n", + " penguins\n", + " , [:species, :sex, :body_mass_g] => \n", + " (x, y, z) -> (x .== \"Adelie\") .& (y .== \"male\") .& (z .> 4000)\n", + " ,skipmissing=true\n", + ")" + ], + "id": "7599d3f0", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + ":::\n", + "\n", + "\n", + "Filtering all the rows with `species` == \"Adelie\" OR `sex` == \"male\".\n", + "\n", + "::: {.panel-tabset}\n", + "\n", + "## Tidier\n" + ], + "id": "db002280" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "@filter penguins (species == \"Adelie\") | (sex == \"male\")" + ], + "id": "d28e9318", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFramesMeta\n" + ], + "id": "b3d63fe1" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "DFM.@rsubset penguins (:species == \"Adelie\") | (:sex == \"male\")" + ], + "id": "9276b145", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFrames\n" + ], + "id": "e7096279" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "subset(penguins, [:species, :sex] => (x, y) -> (x .== \"Adelie\") .| (y .== \"male\"), skipmissing=true)" + ], + "id": "a0668fa9", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + ":::\n", + "\n", + "\n", + "Filtering all the rows where the `flipper_length_mm` is greater than the mean.\n", + "\n", + "::: {.panel-tabset}\n", + "\n", + "## Tidier\n" + ], + "id": "2a22c3ed" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "@filter penguins flipper_length_mm > mean(skipmissing(flipper_length_mm))" + ], + "id": "a5ddbae0", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFramesMeta\n" + ], + "id": "be93d74e" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "DFM.@subset penguins :flipper_length_mm .>= mean(skipmissing(:flipper_length_mm))" + ], + "id": "8d8d6b77", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFrames\n" + ], + "id": "57b2b239" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "subset(penguins, :flipper_length_mm => x -> x .> mean(skipmissing(x)), skipmissing=true)" + ], + "id": "9ed74597", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + ":::\n", + "\n", + "### Filtering with a variable column name\n", + "\n", + "Suppose the column you want to filter is a variable, let's say a symbol\n" + ], + "id": "33256162" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "my_column = :species;" + ], + "id": "493c5c7a", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "::: {.panel-tabset}\n", + "\n", + "## Tidier\n" + ], + "id": "15661579" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "@eval @filter penguins $my_column == \"Adelie\"" + ], + "id": "b3965259", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFramesMeta\n" + ], + "id": "c07f6b56" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "DFM.@rsubset penguins $my_column == \"Adelie\"" + ], + "id": "4624b99a", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFrames\n" + ], + "id": "fcbfbc4b" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "subset(penguins, my_column => x -> x .== \"Adelie\")" + ], + "id": "7066efde", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + ":::\n", + "\n", + "In case the column is a string\n" + ], + "id": "c83c792f" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "my_column_string = \"species\";" + ], + "id": "756fd48f", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "instead of a symbol, we can write in the same way, just taking care in Tidier to convert it to a symbol\n", + "\n", + "::: {.panel-tabset}\n", + "\n", + "## Tidier\n" + ], + "id": "53362155" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "@eval @filter penguins $(Symbol(my_column_string)) == \"Adelie\"" + ], + "id": "0df46c80", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFramesMeta\n" + ], + "id": "38820cc4" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "DFM.@rsubset penguins $(my_column_string) == \"Adelie\"" + ], + "id": "642a18a8", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFrames\n" + ], + "id": "ed35fc5d" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "subset(penguins, my_column_string => x -> x .== \"Adelie\")" + ], + "id": "38af65d1", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + ":::\n", + "\n", + "## Arranging\n", + "\n", + "Arranging is when we reorder the rows of a dataframe according to some columns. The rows are first arranged by the first column, then by the second (if any), and so on. In Tidier, when we want to invert the ordering, just put the column name inside a `desc()` call.\n", + "\n", + "### Arranging by one column\n", + "\n", + "Arrange by `body_mass_g`.\n", + "\n", + "::: {.panel-tabset}\n", + "\n", + "## Tidier\n" + ], + "id": "791c4586" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "@arrange penguins body_mass_g" + ], + "id": "39a99cf6", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFramesMeta\n" + ], + "id": "a5fe4174" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "DFM.@orderby penguins :body_mass_g" + ], + "id": "548845b3", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFrames\n" + ], + "id": "f866a4c9" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "sort(penguins, :body_mass_g)" + ], + "id": "0153f423", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + ":::\n", + "\n", + "### Arranging by two columns, with one reversed\n", + "\n", + "First arrange by `island`, then by reversed `body_mass_g`.\n", + "\n", + "::: {.panel-tabset}\n", + "\n", + "## Tidier\n" + ], + "id": "0cc1eafa" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "@arrange penguins island desc(body_mass_g)" + ], + "id": "0316cc75", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFramesMeta\n" + ], + "id": "c4d5ad8a" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# works only when the reversed column is numeric?\n", + "\n", + "DFM.@orderby penguins :island :body_mass_g .* -1" + ], + "id": "337343f4", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFrames\n" + ], + "id": "e77d71e7" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "sort(penguins, [order(:island), order(:body_mass_g, rev=true)])" + ], + "id": "3a7cc7c7", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + ":::\n", + "\n", + "### Arranging by one variable column\n", + "\n", + "Let's arrange the data by the following column:\n" + ], + "id": "0bcd783c" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "my_arrange_column = :body_mass_g;" + ], + "id": "a8236e40", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "::: {.panel-tabset}\n", + "\n", + "## Tidier\n" + ], + "id": "6c69980b" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "@eval @arrange penguins $my_arrange_column" + ], + "id": "900bfcb2", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFramesMeta\n" + ], + "id": "a05a1200" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "DFM.@orderby penguins $my_arrange_column" + ], + "id": "874ac5cd", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFrames\n" + ], + "id": "1889a601" + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "sort(penguins, my_arrange_column)" + ], + "id": "5e0515a5", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + ":::" + ], + "id": "fac2f7e2" + } + ], + "metadata": { + "kernelspec": { + "name": "julia-1.10", + "language": "julia", + "display_name": "Julia 1.10.4", + "path": "/home/vituri/.local/share/jupyter/kernels/julia-1.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/dataframes.qmd b/dataframes.qmd index 80eee14..0af9cfe 100644 --- a/dataframes.qmd +++ b/dataframes.qmd @@ -129,10 +129,26 @@ We can chain (or pipe) dataframe operations as follows with the `@chain` macro: end ``` +## Using variables as column names + +In Tidier, using the column names as if they were variables in the environment leads to some complication when we want to use other variables that are not column names. + +For example, suppose you want to arrange penguins by a column that is stored in a variable. + +When this happens, we add `@eval` before the Tidier code and add a `$` to force evaluation of the variable, as in the following example: + +```{julia} +#| eval: false +my_arrange_column = :body_mass_g; + +@eval @arrange penguins $my_arrange_column +``` + + ## Documentation https://dataframes.juliadata.org/stable/man/working_with_dataframes/ -https://juliadata.org/DataFramesMeta.jl/stable/#@orderby +https://juliadata.org/DataFramesMeta.jl/stable https://tidierorg.github.io/TidierData.jl/latest/reference/ \ No newline at end of file diff --git a/docs/dataframes-columns.html b/docs/dataframes-columns.html index a2d7e03..4a5e46e 100644 --- a/docs/dataframes-columns.html +++ b/docs/dataframes-columns.html @@ -2,7 +2,7 @@ - + @@ -71,10 +71,10 @@ - + - + - + - + - + - +