\n```\n:::\n:::\n\n\n\n\n\n\n\n## Filtering\n\nTo filter is to keep only the rows that satisfy a certain criteria (ie. a boolean condition).\n\nTo filter a dataframe in Tidier, we use the macro `@filter`. You can use it in the form\n\n\n\n\n\n::: {#4 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter(penguins, species == \"Adelie\")\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Row
species
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
String15
String15
Float64?
Float64?
Int64?
Int64?
String7
1
Adelie
Torgersen
39.1
18.7
181
3750
male
2
Adelie
Torgersen
39.5
17.4
186
3800
female
3
Adelie
Torgersen
40.3
18.0
195
3250
female
4
Adelie
Torgersen
missing
missing
missing
missing
missing
5
Adelie
Torgersen
36.7
19.3
193
3450
female
6
Adelie
Torgersen
39.3
20.6
190
3650
male
7
Adelie
Torgersen
38.9
17.8
181
3625
female
8
Adelie
Torgersen
39.2
19.6
195
4675
male
9
Adelie
Torgersen
34.1
18.1
193
3475
missing
10
Adelie
Torgersen
42.0
20.2
190
4250
missing
11
Adelie
Torgersen
37.8
17.1
186
3300
missing
12
Adelie
Torgersen
37.8
17.3
180
3700
missing
13
Adelie
Torgersen
41.1
17.6
182
3200
female
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
141
Adelie
Dream
40.2
17.1
193
3400
female
142
Adelie
Dream
40.6
17.2
187
3475
male
143
Adelie
Dream
32.1
15.5
188
3050
female
144
Adelie
Dream
40.7
17.0
190
3725
male
145
Adelie
Dream
37.3
16.8
192
3000
female
146
Adelie
Dream
39.0
18.7
185
3650
male
147
Adelie
Dream
39.2
18.6
190
4250
male
148
Adelie
Dream
36.6
18.4
184
3475
female
149
Adelie
Dream
36.0
17.8
195
3450
female
150
Adelie
Dream
37.8
18.1
193
3750
male
151
Adelie
Dream
36.0
17.1
187
3700
female
152
Adelie
Dream
41.5
18.5
201
4000
male
\n```\n:::\n:::\n\n\n\n\n\n\n\nor without parentesis as in \n\n\n\n\n\n::: {#6 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter penguins species == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Row
species
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
String15
String15
Float64?
Float64?
Int64?
Int64?
String7
1
Adelie
Torgersen
39.1
18.7
181
3750
male
2
Adelie
Torgersen
39.5
17.4
186
3800
female
3
Adelie
Torgersen
40.3
18.0
195
3250
female
4
Adelie
Torgersen
missing
missing
missing
missing
missing
5
Adelie
Torgersen
36.7
19.3
193
3450
female
6
Adelie
Torgersen
39.3
20.6
190
3650
male
7
Adelie
Torgersen
38.9
17.8
181
3625
female
8
Adelie
Torgersen
39.2
19.6
195
4675
male
9
Adelie
Torgersen
34.1
18.1
193
3475
missing
10
Adelie
Torgersen
42.0
20.2
190
4250
missing
11
Adelie
Torgersen
37.8
17.1
186
3300
missing
12
Adelie
Torgersen
37.8
17.3
180
3700
missing
13
Adelie
Torgersen
41.1
17.6
182
3200
female
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
141
Adelie
Dream
40.2
17.1
193
3400
female
142
Adelie
Dream
40.6
17.2
187
3475
male
143
Adelie
Dream
32.1
15.5
188
3050
female
144
Adelie
Dream
40.7
17.0
190
3725
male
145
Adelie
Dream
37.3
16.8
192
3000
female
146
Adelie
Dream
39.0
18.7
185
3650
male
147
Adelie
Dream
39.2
18.6
190
4250
male
148
Adelie
Dream
36.6
18.4
184
3475
female
149
Adelie
Dream
36.0
17.8
195
3450
female
150
Adelie
Dream
37.8
18.1
193
3750
male
151
Adelie
Dream
36.0
17.1
187
3700
female
152
Adelie
Dream
41.5
18.5
201
4000
male
\n```\n:::\n:::\n\n\n\n\n\n\n\nNotice that the columns are typed as if they were variables on the Julia environment. This is inspired by the `tidyverse` behaviour of data-masking: inside a tidyverse verb, the columns are taken as \"statistical variables\" that exist inside the dataframe as columns.\n\nIn DataFramesMeta, we have two macros for filtering: `@subset` and `@rsubset`. Use the first when you have some criteria that uses a whole column, for example:\n\n\n\n\n\n::: {#8 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@subset penguins :body_mass_g .>= mean(skipmissing(:body_mass_g))\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
149×7 DataFrame
124 rows omitted
Row
species
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
String15
String15
Float64?
Float64?
Int64?
Int64?
String7
1
Adelie
Torgersen
39.2
19.6
195
4675
male
2
Adelie
Torgersen
42.0
20.2
190
4250
missing
3
Adelie
Torgersen
34.6
21.1
198
4400
male
4
Adelie
Torgersen
42.5
20.7
197
4500
male
5
Adelie
Dream
39.8
19.1
184
4650
male
6
Adelie
Dream
44.1
19.7
196
4400
male
7
Adelie
Dream
39.6
18.8
190
4600
male
8
Adelie
Biscoe
40.1
18.9
188
4300
male
9
Adelie
Biscoe
41.3
21.1
195
4400
male
10
Adelie
Torgersen
41.8
19.4
198
4450
male
11
Adelie
Torgersen
42.8
18.5
195
4250
male
12
Adelie
Torgersen
42.9
17.6
196
4700
male
13
Adelie
Dream
41.1
18.1
205
4300
male
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
138
Gentoo
Biscoe
47.2
13.7
214
4925
female
139
Gentoo
Biscoe
46.8
14.3
215
4850
female
140
Gentoo
Biscoe
50.4
15.7
222
5750
male
141
Gentoo
Biscoe
45.2
14.8
212
5200
female
142
Gentoo
Biscoe
49.9
16.1
213
5400
male
143
Chinstrap
Dream
49.2
18.2
195
4400
male
144
Chinstrap
Dream
52.8
20.0
205
4550
male
145
Chinstrap
Dream
54.2
20.8
201
4300
male
146
Chinstrap
Dream
52.0
20.7
210
4800
male
147
Chinstrap
Dream
53.5
19.9
205
4500
male
148
Chinstrap
Dream
50.8
18.5
201
4450
male
149
Chinstrap
Dream
49.0
19.6
212
4300
male
\n```\n:::\n:::\n\n\n\n\n\n\n\nNotice the broadcast on >=. We need it because *each variable is interpreted as an array (the whole column)*. Also, notice that we refer to columns as _symbols_ (i.e. we append `:` to it).\n\nIn the above example, we needed the whole column `body_mass_g` to take the mean and then filter the rows based on that. If, however, your filtering criteria only uses information about each row (without needing to see it in context of the whole column), then `@rsubset` (row subset) is easier to use: it interprets each columns as a value (not an array), so no broadcasting is needed:\n\n\n\n\n\n::: {#10 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rsubset penguins :species == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Row
species
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
String15
String15
Float64?
Float64?
Int64?
Int64?
String7
1
Adelie
Torgersen
39.1
18.7
181
3750
male
2
Adelie
Torgersen
39.5
17.4
186
3800
female
3
Adelie
Torgersen
40.3
18.0
195
3250
female
4
Adelie
Torgersen
missing
missing
missing
missing
missing
5
Adelie
Torgersen
36.7
19.3
193
3450
female
6
Adelie
Torgersen
39.3
20.6
190
3650
male
7
Adelie
Torgersen
38.9
17.8
181
3625
female
8
Adelie
Torgersen
39.2
19.6
195
4675
male
9
Adelie
Torgersen
34.1
18.1
193
3475
missing
10
Adelie
Torgersen
42.0
20.2
190
4250
missing
11
Adelie
Torgersen
37.8
17.1
186
3300
missing
12
Adelie
Torgersen
37.8
17.3
180
3700
missing
13
Adelie
Torgersen
41.1
17.6
182
3200
female
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
141
Adelie
Dream
40.2
17.1
193
3400
female
142
Adelie
Dream
40.6
17.2
187
3475
male
143
Adelie
Dream
32.1
15.5
188
3050
female
144
Adelie
Dream
40.7
17.0
190
3725
male
145
Adelie
Dream
37.3
16.8
192
3000
female
146
Adelie
Dream
39.0
18.7
185
3650
male
147
Adelie
Dream
39.2
18.6
190
4250
male
148
Adelie
Dream
36.6
18.4
184
3475
female
149
Adelie
Dream
36.0
17.8
195
3450
female
150
Adelie
Dream
37.8
18.1
193
3750
male
151
Adelie
Dream
36.0
17.1
187
3700
female
152
Adelie
Dream
41.5
18.5
201
4000
male
\n```\n:::\n:::\n\n\n\n\n\n\n\nIn both Tidier and DataFramesMeta, only the rows to which the criteria is `true` are returned. This means that `false` and `missing` are thrown away.\n\nIn DataFrames, we use the `subset` function, and the criteria is passed with the notation\n\n\n\n\n\n::: {#12 .cell execution_count=0}\n``` {.julia .cell-code}\nsubset(penguins, :column => boolean_function)\n\n```\n:::\n\n\n\n\n\n\n\nwhere `boolean_function` is a boolean (with possibly `missing` values) function on 1 variable. Add the kwarg `skipmissing=true` if you want to get rid of missing values.\n\n### Filtering with one criteria\n\nFiltering all the rows with `species` = \"Adelie\".\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#14 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter penguins species == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#18 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(penguins, :species => x -> x .== \"Adelie\", skipmissing=true)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Row
species
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
String15
String15
Float64?
Float64?
Int64?
Int64?
String7
1
Adelie
Torgersen
39.1
18.7
181
3750
male
2
Adelie
Torgersen
39.5
17.4
186
3800
female
3
Adelie
Torgersen
40.3
18.0
195
3250
female
4
Adelie
Torgersen
missing
missing
missing
missing
missing
5
Adelie
Torgersen
36.7
19.3
193
3450
female
6
Adelie
Torgersen
39.3
20.6
190
3650
male
7
Adelie
Torgersen
38.9
17.8
181
3625
female
8
Adelie
Torgersen
39.2
19.6
195
4675
male
9
Adelie
Torgersen
34.1
18.1
193
3475
missing
10
Adelie
Torgersen
42.0
20.2
190
4250
missing
11
Adelie
Torgersen
37.8
17.1
186
3300
missing
12
Adelie
Torgersen
37.8
17.3
180
3700
missing
13
Adelie
Torgersen
41.1
17.6
182
3200
female
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
141
Adelie
Dream
40.2
17.1
193
3400
female
142
Adelie
Dream
40.6
17.2
187
3475
male
143
Adelie
Dream
32.1
15.5
188
3050
female
144
Adelie
Dream
40.7
17.0
190
3725
male
145
Adelie
Dream
37.3
16.8
192
3000
female
146
Adelie
Dream
39.0
18.7
185
3650
male
147
Adelie
Dream
39.2
18.6
190
4250
male
148
Adelie
Dream
36.6
18.4
184
3475
female
149
Adelie
Dream
36.0
17.8
195
3450
female
150
Adelie
Dream
37.8
18.1
193
3750
male
151
Adelie
Dream
36.0
17.1
187
3700
female
152
Adelie
Dream
41.5
18.5
201
4000
male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n### Filtering with several criteria\n\nFiltering all the rows with `species` = \"Adelie\", `sex` = \"male\" and `body_mass_g` > 4000.\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#20 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter penguins species == \"Adelie\" sex == \"male\" body_mass_g > 4000\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n\nFiltering all the rows where the `flipper_length_mm` is greater than the mean.\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#32 .cell execution_count=1}\n``` {.julia .cell-code}\n@filter penguins flipper_length_mm > mean(skipmissing(flipper_length_mm))\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#36 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(penguins, :flipper_length_mm => x -> x .> mean(skipmissing(x)), skipmissing=true)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
148×7 DataFrame
123 rows omitted
Row
species
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
String15
String15
Float64?
Float64?
Int64?
Int64?
String7
1
Adelie
Dream
35.7
18.0
202
3550
female
2
Adelie
Dream
41.1
18.1
205
4300
male
3
Adelie
Dream
40.8
18.9
208
4300
male
4
Adelie
Biscoe
41.0
20.0
203
4725
male
5
Adelie
Torgersen
41.4
18.5
202
3875
male
6
Adelie
Torgersen
44.1
18.0
210
4000
male
7
Adelie
Dream
41.5
18.5
201
4000
male
8
Gentoo
Biscoe
46.1
13.2
211
4500
female
9
Gentoo
Biscoe
50.0
16.3
230
5700
male
10
Gentoo
Biscoe
48.7
14.1
210
4450
female
11
Gentoo
Biscoe
50.0
15.2
218
5700
male
12
Gentoo
Biscoe
47.6
14.5
215
5400
male
13
Gentoo
Biscoe
46.5
13.5
210
4550
female
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
137
Chinstrap
Dream
53.5
19.9
205
4500
male
138
Chinstrap
Dream
49.0
19.5
210
3950
male
139
Chinstrap
Dream
50.8
18.5
201
4450
male
140
Chinstrap
Dream
49.0
19.6
212
4300
male
141
Chinstrap
Dream
51.4
19.0
201
3950
male
142
Chinstrap
Dream
50.7
19.7
203
4050
male
143
Chinstrap
Dream
49.3
19.9
203
4050
male
144
Chinstrap
Dream
50.2
18.8
202
3800
male
145
Chinstrap
Dream
51.9
19.5
206
3950
male
146
Chinstrap
Dream
55.8
19.8
207
4000
male
147
Chinstrap
Dream
43.5
18.1
202
3400
female
148
Chinstrap
Dream
50.8
19.0
210
4100
male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n### Filtering with a variable column name\n\nSuppose the column you want to filter is a variable, let's say\n\n\n\n\n\n::: {#38 .cell execution_count=1}\n``` {.julia .cell-code}\nmy_column = :species\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```\n:species\n```\n:::\n:::\n\n\n\n\n\n\n\n::: {.panel-tabset}\n\n## Tidier\n\n\n\n\n\n::: {#40 .cell execution_count=1}\n``` {.julia .cell-code}\n# how to do it??\n# @filter(penguins, !!(my_column) .== \"Adelie\")\n```\n:::\n\n\n\n\n\n\n\n## DataFramesMeta\n\n\n\n\n\n::: {#42 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rsubset penguins $my_column == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Row
species
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
String15
String15
Float64?
Float64?
Int64?
Int64?
String7
1
Adelie
Torgersen
39.1
18.7
181
3750
male
2
Adelie
Torgersen
39.5
17.4
186
3800
female
3
Adelie
Torgersen
40.3
18.0
195
3250
female
4
Adelie
Torgersen
missing
missing
missing
missing
missing
5
Adelie
Torgersen
36.7
19.3
193
3450
female
6
Adelie
Torgersen
39.3
20.6
190
3650
male
7
Adelie
Torgersen
38.9
17.8
181
3625
female
8
Adelie
Torgersen
39.2
19.6
195
4675
male
9
Adelie
Torgersen
34.1
18.1
193
3475
missing
10
Adelie
Torgersen
42.0
20.2
190
4250
missing
11
Adelie
Torgersen
37.8
17.1
186
3300
missing
12
Adelie
Torgersen
37.8
17.3
180
3700
missing
13
Adelie
Torgersen
41.1
17.6
182
3200
female
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
141
Adelie
Dream
40.2
17.1
193
3400
female
142
Adelie
Dream
40.6
17.2
187
3475
male
143
Adelie
Dream
32.1
15.5
188
3050
female
144
Adelie
Dream
40.7
17.0
190
3725
male
145
Adelie
Dream
37.3
16.8
192
3000
female
146
Adelie
Dream
39.0
18.7
185
3650
male
147
Adelie
Dream
39.2
18.6
190
4250
male
148
Adelie
Dream
36.6
18.4
184
3475
female
149
Adelie
Dream
36.0
17.8
195
3450
female
150
Adelie
Dream
37.8
18.1
193
3750
male
151
Adelie
Dream
36.0
17.1
187
3700
female
152
Adelie
Dream
41.5
18.5
201
4000
male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#44 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(penguins, my_column => x -> x .== \"Adelie\")\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Row
species
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
String15
String15
Float64?
Float64?
Int64?
Int64?
String7
1
Adelie
Torgersen
39.1
18.7
181
3750
male
2
Adelie
Torgersen
39.5
17.4
186
3800
female
3
Adelie
Torgersen
40.3
18.0
195
3250
female
4
Adelie
Torgersen
missing
missing
missing
missing
missing
5
Adelie
Torgersen
36.7
19.3
193
3450
female
6
Adelie
Torgersen
39.3
20.6
190
3650
male
7
Adelie
Torgersen
38.9
17.8
181
3625
female
8
Adelie
Torgersen
39.2
19.6
195
4675
male
9
Adelie
Torgersen
34.1
18.1
193
3475
missing
10
Adelie
Torgersen
42.0
20.2
190
4250
missing
11
Adelie
Torgersen
37.8
17.1
186
3300
missing
12
Adelie
Torgersen
37.8
17.3
180
3700
missing
13
Adelie
Torgersen
41.1
17.6
182
3200
female
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
141
Adelie
Dream
40.2
17.1
193
3400
female
142
Adelie
Dream
40.6
17.2
187
3475
male
143
Adelie
Dream
32.1
15.5
188
3050
female
144
Adelie
Dream
40.7
17.0
190
3725
male
145
Adelie
Dream
37.3
16.8
192
3000
female
146
Adelie
Dream
39.0
18.7
185
3650
male
147
Adelie
Dream
39.2
18.6
190
4250
male
148
Adelie
Dream
36.6
18.4
184
3475
female
149
Adelie
Dream
36.0
17.8
195
3450
female
150
Adelie
Dream
37.8
18.1
193
3750
male
151
Adelie
Dream
36.0
17.1
187
3700
female
152
Adelie
Dream
41.5
18.5
201
4000
male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\nIn case the column is a string\n\n\n\n\n\n::: {#46 .cell execution_count=1}\n``` {.julia .cell-code}\nmy_column2 = \"species\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```\n\"species\"\n```\n:::\n:::\n\n\n\n\n\n\n\ninstead of a symbol, we can write\n\n::: {.panel-tabset}\n\n## DataFramesMeta\n\n\n\n\n\n::: {#48 .cell execution_count=1}\n``` {.julia .cell-code}\nDFM.@rsubset penguins $(Symbol(my_column2)) == \"Adelie\"\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Row
species
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
String15
String15
Float64?
Float64?
Int64?
Int64?
String7
1
Adelie
Torgersen
39.1
18.7
181
3750
male
2
Adelie
Torgersen
39.5
17.4
186
3800
female
3
Adelie
Torgersen
40.3
18.0
195
3250
female
4
Adelie
Torgersen
missing
missing
missing
missing
missing
5
Adelie
Torgersen
36.7
19.3
193
3450
female
6
Adelie
Torgersen
39.3
20.6
190
3650
male
7
Adelie
Torgersen
38.9
17.8
181
3625
female
8
Adelie
Torgersen
39.2
19.6
195
4675
male
9
Adelie
Torgersen
34.1
18.1
193
3475
missing
10
Adelie
Torgersen
42.0
20.2
190
4250
missing
11
Adelie
Torgersen
37.8
17.1
186
3300
missing
12
Adelie
Torgersen
37.8
17.3
180
3700
missing
13
Adelie
Torgersen
41.1
17.6
182
3200
female
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
141
Adelie
Dream
40.2
17.1
193
3400
female
142
Adelie
Dream
40.6
17.2
187
3475
male
143
Adelie
Dream
32.1
15.5
188
3050
female
144
Adelie
Dream
40.7
17.0
190
3725
male
145
Adelie
Dream
37.3
16.8
192
3000
female
146
Adelie
Dream
39.0
18.7
185
3650
male
147
Adelie
Dream
39.2
18.6
190
4250
male
148
Adelie
Dream
36.6
18.4
184
3475
female
149
Adelie
Dream
36.0
17.8
195
3450
female
150
Adelie
Dream
37.8
18.1
193
3750
male
151
Adelie
Dream
36.0
17.1
187
3700
female
152
Adelie
Dream
41.5
18.5
201
4000
male
\n```\n:::\n:::\n\n\n\n\n\n\n\n## DataFrames\n\n\n\n\n\n::: {#50 .cell execution_count=1}\n``` {.julia .cell-code}\nsubset(penguins, my_column2 => x -> x .== \"Adelie\")\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
152×7 DataFrame
127 rows omitted
Row
species
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
String15
String15
Float64?
Float64?
Int64?
Int64?
String7
1
Adelie
Torgersen
39.1
18.7
181
3750
male
2
Adelie
Torgersen
39.5
17.4
186
3800
female
3
Adelie
Torgersen
40.3
18.0
195
3250
female
4
Adelie
Torgersen
missing
missing
missing
missing
missing
5
Adelie
Torgersen
36.7
19.3
193
3450
female
6
Adelie
Torgersen
39.3
20.6
190
3650
male
7
Adelie
Torgersen
38.9
17.8
181
3625
female
8
Adelie
Torgersen
39.2
19.6
195
4675
male
9
Adelie
Torgersen
34.1
18.1
193
3475
missing
10
Adelie
Torgersen
42.0
20.2
190
4250
missing
11
Adelie
Torgersen
37.8
17.1
186
3300
missing
12
Adelie
Torgersen
37.8
17.3
180
3700
missing
13
Adelie
Torgersen
41.1
17.6
182
3200
female
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
141
Adelie
Dream
40.2
17.1
193
3400
female
142
Adelie
Dream
40.6
17.2
187
3475
male
143
Adelie
Dream
32.1
15.5
188
3050
female
144
Adelie
Dream
40.7
17.0
190
3725
male
145
Adelie
Dream
37.3
16.8
192
3000
female
146
Adelie
Dream
39.0
18.7
185
3650
male
147
Adelie
Dream
39.2
18.6
190
4250
male
148
Adelie
Dream
36.6
18.4
184
3475
female
149
Adelie
Dream
36.0
17.8
195
3450
female
150
Adelie
Dream
37.8
18.1
193
3750
male
151
Adelie
Dream
36.0
17.1
187
3700
female
152
Adelie
Dream
41.5
18.5
201
4000
male
\n```\n:::\n:::\n\n\n\n\n\n\n\n:::\n\n## Arranging\n\nArranging is when we reorder the rows of a dataframe according to some criteria.\n\n\n\n\n\n::: {#52 .cell execution_count=1}\n``` {.julia .cell-code}\n@arrange penguins body_mass_g\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n```\n:::\n:::\n\n\n",
+ "supporting": [
+ "dataframes-rows_files"
+ ],
+ "filters": [],
+ "includes": {
+ "include-in-header": [
+ "\n\n\n"
+ ]
+ }
+ }
+}
\ No newline at end of file
diff --git a/_freeze/dataframes/execute-results/html.json b/_freeze/dataframes/execute-results/html.json
index ea227e6..be0a4ce 100644
--- a/_freeze/dataframes/execute-results/html.json
+++ b/_freeze/dataframes/execute-results/html.json
@@ -1,8 +1,8 @@
{
- "hash": "103b5252701e836620eb447a28e1e311",
+ "hash": "455f47e9eeff41c1f1437673418aca7b",
"result": {
"engine": "julia",
- "markdown": "---\n# jupyter: julia-1.10\nengine: julia\n---\n\n\n\n\n\n# Part 2: Dataframes\n\nDataframes are one of the most important objects in data science. A dataframe is a table where each row is an observation and each column is a variable.\n\nWe will use the Palmer Penguin dataset as a toy example for the remaining of the chapter.\n\n\n\n\n\n::: {#2 .cell execution_count=1}\n``` {.julia .cell-code}\nusing DataFrames, PalmerPenguins\nusing Tidier\nimport DataFramesMeta as DFM\n\npenguins = PalmerPenguins.load() |> DataFrame\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Row
species
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
String15
String15
Float64?
Float64?
Int64?
Int64?
String7
1
Adelie
Torgersen
39.1
18.7
181
3750
male
2
Adelie
Torgersen
39.5
17.4
186
3800
female
3
Adelie
Torgersen
40.3
18.0
195
3250
female
4
Adelie
Torgersen
missing
missing
missing
missing
missing
5
Adelie
Torgersen
36.7
19.3
193
3450
female
6
Adelie
Torgersen
39.3
20.6
190
3650
male
7
Adelie
Torgersen
38.9
17.8
181
3625
female
8
Adelie
Torgersen
39.2
19.6
195
4675
male
9
Adelie
Torgersen
34.1
18.1
193
3475
missing
10
Adelie
Torgersen
42.0
20.2
190
4250
missing
11
Adelie
Torgersen
37.8
17.1
186
3300
missing
12
Adelie
Torgersen
37.8
17.3
180
3700
missing
13
Adelie
Torgersen
41.1
17.6
182
3200
female
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
333
Chinstrap
Dream
45.2
16.6
191
3250
female
334
Chinstrap
Dream
49.3
19.9
203
4050
male
335
Chinstrap
Dream
50.2
18.8
202
3800
male
336
Chinstrap
Dream
45.6
19.4
194
3525
female
337
Chinstrap
Dream
51.9
19.5
206
3950
male
338
Chinstrap
Dream
46.8
16.5
189
3650
female
339
Chinstrap
Dream
45.7
17.0
195
3650
female
340
Chinstrap
Dream
55.8
19.8
207
4000
male
341
Chinstrap
Dream
43.5
18.1
202
3400
female
342
Chinstrap
Dream
49.6
18.2
193
3775
male
343
Chinstrap
Dream
50.8
19.0
210
4100
male
344
Chinstrap
Dream
50.2
18.7
198
3775
female
\n```\n:::\n:::\n\n\n\n\n\n\n\n::: {.callout-note}\n\n`Dataframes.jl` is the main package for dealing with dataframes in Julia. You can use it directly to manipulate tables, but we also have 2 alternatives: DataFramesMeta and Tidier. \n\nDataFramesMeta is a collection of macros based on DataFrames.\n\nTidier is inspired by the `tidyverse` ecosystem in R. Tidier use macros to rewrite your code into DataFrames.jl code. Because of this \"tidy\" heritance, we will often talk about the R packages that inspired the Julia ones (like `dplyr`, `tidyr` and many others).\n\nIn this book, whenever possible, we will show the different approaches in a tabset so you can compare them.\n:::\n\n## Operations\n\nLet's start with some operations that take only one dataframe as input.^[Join operations will be dealt later.]. Here is the basic terminology:\n\n- *Selecting* is when we select some columns of a dataframe, while keeping all the rows. Example: select the `species` and `sex` columns.\n\n- *Filtering* or *subsetting* is when we select a subset of rows based on some criteria. Example: all male penguins of species Adelie. The output is a dataframe with the exact same columns, but possibly fewer rows.\n\n- *Mutating* or *transforming* is when we create new columns. Example: a new column `body_mass_kg` can be obtained dividing the column `body_mass_g` by 1000.\n\n- *Grouping* is when we split the dataframe into a collection (array) of dataframes using some criteria. Example: grouping by `species` gives us 3 dataframes, each with only one species.\n\n- *Summarising* or *combining* is when we apply some function to some columns in order to reduce the amount of rows with some kind of summary (like a mean, median, max, and so on). Example: for each `species`, apply the `mean` function to the columns `body_mass_g`. This will yield a dataframe with 3 rows, one for each species. Summarising is usually done after a grouping, so the summary is calculated with relation to each of the groups.\n\n- *Arranging* or *ordering* is when we reorder the rows of a dataframe using some criteria.\n\nSince all these functions return a dataframe (or an array of dataframes, in the case of grouping), we can chain these operations together, with the convention that on grouped dataframes we apply the function in each one of the groups.\n\nLet's see each operation with more details.\n\n## Comparing Tidier with DataFramesMeta\n\nThe following table list the operations on each package:\n\n| dplyr | Tidier | DataFramesMeta | DataFrames |\n|-------------|--------------|------------------------------|--------------|\n| `select` | `@select` | `@select` | array sintax |\n| `filter` | `@filter` | `@subset` / `@rsubset` | `filter` |\n| `mutate` | `@mutate` | `@transform` / `@rtransform` | array sintax |\n| `group_by` | `@group_by` | `@groupby` | `groupby` |\n| `summarise` | `@summarise` | `@combine` | `combine` |\n| `arrange` | `@arrange` | `@orderby` / `@rorderby` | `sort!` |\n\n\nNotice that we have a name clash with `@select`: that is why we `import DataFramesMeta as DFM` at the beginning.\n\n",
+ "markdown": "---\n# jupyter: julia-1.10\nengine: julia\n---\n\n\n\n\n\n# Part 2: Dataframes\n\nDataframes are one of the most important objects in data science. A dataframe is a table where each row is an observation and each column is a variable.\n\nWe will use the Palmer Penguin dataset as a toy example for the remaining of the chapter.\n\n\n\n\n\n::: {#2 .cell execution_count=1}\n``` {.julia .cell-code}\nusing DataFrames, PalmerPenguins\nusing Tidier, Chain\nimport DataFramesMeta as DFM\n\npenguins = PalmerPenguins.load() |> DataFrame\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
344×7 DataFrame
319 rows omitted
Row
species
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
String15
String15
Float64?
Float64?
Int64?
Int64?
String7
1
Adelie
Torgersen
39.1
18.7
181
3750
male
2
Adelie
Torgersen
39.5
17.4
186
3800
female
3
Adelie
Torgersen
40.3
18.0
195
3250
female
4
Adelie
Torgersen
missing
missing
missing
missing
missing
5
Adelie
Torgersen
36.7
19.3
193
3450
female
6
Adelie
Torgersen
39.3
20.6
190
3650
male
7
Adelie
Torgersen
38.9
17.8
181
3625
female
8
Adelie
Torgersen
39.2
19.6
195
4675
male
9
Adelie
Torgersen
34.1
18.1
193
3475
missing
10
Adelie
Torgersen
42.0
20.2
190
4250
missing
11
Adelie
Torgersen
37.8
17.1
186
3300
missing
12
Adelie
Torgersen
37.8
17.3
180
3700
missing
13
Adelie
Torgersen
41.1
17.6
182
3200
female
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
333
Chinstrap
Dream
45.2
16.6
191
3250
female
334
Chinstrap
Dream
49.3
19.9
203
4050
male
335
Chinstrap
Dream
50.2
18.8
202
3800
male
336
Chinstrap
Dream
45.6
19.4
194
3525
female
337
Chinstrap
Dream
51.9
19.5
206
3950
male
338
Chinstrap
Dream
46.8
16.5
189
3650
female
339
Chinstrap
Dream
45.7
17.0
195
3650
female
340
Chinstrap
Dream
55.8
19.8
207
4000
male
341
Chinstrap
Dream
43.5
18.1
202
3400
female
342
Chinstrap
Dream
49.6
18.2
193
3775
male
343
Chinstrap
Dream
50.8
19.0
210
4100
male
344
Chinstrap
Dream
50.2
18.7
198
3775
female
\n```\n:::\n:::\n\n\n\n\n\n\n\n::: {.callout-note}\n\n`Dataframes.jl` is the main package for dealing with dataframes in Julia. You can use it directly to manipulate tables, but we also have 2 alternatives: DataFramesMeta and Tidier. \n\nDataFramesMeta is a collection of macros based on DataFrames.\n\nTidier is inspired by the `tidyverse` ecosystem in R. Tidier use macros to rewrite your code into DataFrames.jl code. Because of this \"tidy\" heritance, we will often talk about the R packages that inspired the Julia ones (like `dplyr`, `tidyr` and many others).\n\nIn this book, whenever possible, we will show the different approaches in a tabset so you can compare them, giving more emphasis on Tidier.\n:::\n\n## Operations\n\nLet's start with some unary operations, ie. operations that take only one dataframe as input and return one dataframe as output.^[Join operations will be dealt later.]. We can divide these operations in some categories:\n\n### Rows operations\n\nThese are operations that only affect rows, leaving all columns untouched.\n\n- *Filtering* or *subsetting* is when we select a subset of rows based on some criteria. Example: all male penguins of species Adelie. The output is a dataframe with the exact same columns, but possibly fewer rows.\n\n- *Arranging* or *ordering* is when we reorder the rows of a dataframe using some criteria.\n\n### Column operations\n\nThese are operations that only affect columns, leaving all rows untouched.\n\n- *Selecting* is when we select some columns of a dataframe, while keeping all the rows. Example: select the `species` and `sex` columns.\n\n- *Mutating* or *transforming* is when we create new columns. Example: a new column `body_mass_kg` can be obtained dividing the column `body_mass_g` by 1000.\n\n### Reshaping operations\n\nThese operations change the shape of a dataframe, making it wider or longer.\n\n- `Widening`\n\n- `Longering`?\n\n### Grouping operations\n\n- *Grouping* is when we split the dataframe into a collection (array) of dataframes using some criteria. Example: grouping by `species` gives us 3 dataframes, each with only one species.\n\n### Mixed operations\n\nThese operations can possibly change rows and columns at the same time.\n\n- Distinct;\n- Counting;\n- *Summarising* or *combining* is when we apply some function to some columns in order to reduce the amount of rows with some kind of summary (like a mean, median, max, and so on). Example: for each `species`, apply the `mean` function to the columns `body_mass_g`. This will yield a dataframe with 3 rows, one for each species. Summarising is usually done after a grouping, so the summary is calculated with relation to each of the groups.\n\n??? deixar grupo e sumário juntos?\n\nSince all these functions return a dataframe (or an array of dataframes, in the case of grouping), we can chain these operations together, with the convention that on grouped dataframes we apply the function in each one of the groups.\n\nNow for binary operations (ie. operations that take two dataframes), we have all the joins:\n\n- Left join;\n- Right join;\n- Inner join;\n- Outer join;\n- Full join.\n\n## Comparing Tidier with DataFramesMeta\n\nThe following table list the operations on each package:\n\n| dplyr | Tidier | DataFramesMeta | DataFrames |\n|-------------|--------------|------------------------------|--------------|\n| `filter` | `@filter` | `@subset` / `@rsubset` | `subset` |\n| `arrange` | `@arrange` | `@orderby` / `@rorderby` | `sort!` |\n| `select` | `@select` | `@select` | array sintax |\n| `mutate` | `@mutate` | `@transform` / `@rtransform` | array sintax |\n| `group_by` | `@group_by` | `@groupby` | `groupby` |\n| `summarise` | `@summarise` | `@combine` | `combine` |\n\nIt is clear that for those coming from `R`, Tidier will look like the most natural approach.\n\nNotice that we have a name clash with `@select`: that is why we `import DataFramesMeta as DFM` at the beginning.\n\nWe will see each operation with more details in the following chapters.\n\n## Chaining operations\n\nWe can chain (or pipe) dataframe operations as follows with the `@chain` macro:\n\n\n\n\n\n::: {#4 .cell execution_count=0}\n``` {.julia .cell-code}\n@chain penguins begin\n @filter !ismissing(sex)\n @group_by sex\n @summarise mean = mean(bill_length_mm)\n @arrange mean\nend\n```\n:::\n\n\n",
"supporting": [
"dataframes_files"
],
diff --git a/_quarto.yml b/_quarto.yml
index b14d4f8..dbeb4e9 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -34,7 +34,9 @@ book:
# - dataframes.qmd
- part: dataframes.qmd
chapters:
- - dataframes-filtering.qmd
+ - dataframes-rows.qmd
+ # - dataframes-columns.qmd
+ # - dataframes-groups.qmd
# - part: "Part 2: Dataframes"
- part: "Part 3: Reading data"
- part: "Part 4: Plotting data"
diff --git a/dataframes-columns.qmd b/dataframes-columns.qmd
new file mode 100644
index 0000000..1e67807
--- /dev/null
+++ b/dataframes-columns.qmd
@@ -0,0 +1,19 @@
+---
+# jupyter: julia-1.10
+engine: julia
+---
+
+## Operations on columns
+
+::: {.panel-tabset}
+
+## Tidier
+
+## DataFramesMeta
+
+## DataFrames
+
+:::
+
+## Conditionally mutating columns
+
diff --git a/dataframes-filtering.qmd b/dataframes-filtering.qmd
deleted file mode 100644
index a0c616f..0000000
--- a/dataframes-filtering.qmd
+++ /dev/null
@@ -1,159 +0,0 @@
----
-# jupyter: julia-1.10
-engine: julia
----
-
-# Filtering
-
-```{julia}
-using DataFrames, PalmerPenguins
-using Tidier
-import DataFramesMeta as DFM
-
-penguins = PalmerPenguins.load() |> DataFrame;
-@slice_head(penguins, n = 15)
-```
-
-To filter a dataframe in Tidier, we use the macro `@filter`. You can use it in the form
-
-```{julia}
-@filter(penguins, species == "Adelie")
-```
-
-or without parentesis as in
-
-```{julia}
-@filter penguins species == "Adelie"
-```
-
-Notice that the columns are typed as if they were variables on the Julia environment. This is inspired by the `tidyverse` behaviour of data-masking: inside a tidyverse verb, the columns are taken as "statistical variables" that exist inside the dataframe as columns.
-
-In DataFramesMeta, we have two macros for filtering: `@subset` and `@rsubset`. Use the first when you have some criteria that uses the whole dataframe, for example:
-
-```{julia}
-DFM.@subset penguins :body_mass_g .>= mean(skipmissing(:body_mass_g))
-```
-
-Notice the broadcast on >=. We need it because *each row is interpreted as an array*. Also, notice that we refer to columns as _symbols_ (i.e. we append `:` to it).
-
-In the above example, we needed the whole column `body_mass_g` to take the mean and then filter the rows based on that. If, however, your filtering criteria only uses information about each row, then `@rsubset` (row subset) is easier to use: it interprets each columns as a value (not an array), so no broadcasting is needed:
-
-```{julia}
-DFM.@rsubset penguins :species == "Adelie"
-```
-
-In both Tidier and DataFramesMeta, only the rows to which the criteria is `true` are returned. This means that you don't need to worry about `missing` values in cases where the criteria do not return `false` nor `true.
-
-## Filtering with one criteria
-
-Filtering all the rows with `species` = "Adelie".
-
-::: {.panel-tabset}
-
-## Tidier
-
-```{julia}
-@filter penguins species == "Adelie"
-```
-
-## DataFramesMeta
-
-```{julia}
-DFM.@rsubset penguins :species == "Adelie"
-```
-
-## DataFrames
-
-```{julia}
-filter(r -> r.species == "Adelie", penguins)
-```
-
-:::
-
-## Filtering with several criteria
-
-Filtering all the rows with `species` = "Adelie", `sex` = "male" and `body_mass_g` > 4000.
-
-::: {.panel-tabset}
-
-## Tidier
-
-```{julia}
-@filter penguins species == "Adelie" sex == "male" body_mass_g > 4000
-```
-
-## DataFramesMeta
-
-```{julia}
-DFM.@rsubset penguins :species == "Adelie" :sex == "male" :body_mass_g > 4000
-```
-
-## DataFrames
-
-```{julia}
-filter(r -> ((r.species == "Adelie") & (r.sex == "male") & (r.body_mass_g > 4000)) === true, penguins)
-```
-
-:::
-
-
-Filtering all the rows where the `flipper_length_mm` is greater than the mean.
-
-::: {.panel-tabset}
-
-## Tidier
-
-```{julia}
-@filter penguins flipper_length_mm > mean(skipmissing(flipper_length_mm))
-```
-
-## DataFramesMeta
-
-```{julia}
-DFM.@subset penguins :flipper_length_mm .>= mean(skipmissing(:flipper_length_mm))
-```
-
-## DataFrames
-
-```{julia}
-filter(r -> (r.flipper_length_mm > mean(skipmissing(penguins.flipper_length_mm))) === true, penguins)
-```
-
-:::
-
-## Filtering with a variable column name
-
-Suppose the column you want to filter is a variable, let's say
-
-```{julia}
-# filter_column = "species"
-column_symbol = :species
-```
-
-::: {.panel-tabset}
-
-## Tidier
-
-```{julia}
-# @chain penguins begin
-# @filter(!!filter_column == "Adelie")
-# # @select(!!filter_column)
-# end
-# @filter(penguins, !!filter_column == "Adelie")
-```
-
-## DataFramesMeta
-
-```{julia}
-DFM.@rsubset penguins $column_symbol == "Adelie"
-```
-
-:::
-
-In case the column is a string instead of a symbol, we can write
-
-```{julia}
-column_string = "species"
-
-DFM.@rsubset penguins $(Symbol(column_string)) == "Adelie"
-```
\ No newline at end of file
diff --git a/dataframes-mutating.qmd b/dataframes-groups.qmd
similarity index 100%
rename from dataframes-mutating.qmd
rename to dataframes-groups.qmd
diff --git a/dataframes-reshape.qmd b/dataframes-reshape.qmd
new file mode 100644
index 0000000..e58845d
--- /dev/null
+++ b/dataframes-reshape.qmd
@@ -0,0 +1,16 @@
+---
+# jupyter: julia-1.10
+engine: julia
+---
+
+## Creating columns
+
+::: {.panel-tabset}
+
+## Tidier
+
+## DataFramesMeta
+
+## DataFrames
+
+:::
\ No newline at end of file
diff --git a/dataframes-rows.qmd b/dataframes-rows.qmd
new file mode 100644
index 0000000..6971f15
--- /dev/null
+++ b/dataframes-rows.qmd
@@ -0,0 +1,233 @@
+---
+# jupyter: julia-1.10
+engine: julia
+---
+
+# Operations on rows
+
+```{julia}
+using DataFrames, PalmerPenguins
+using Tidier
+import DataFramesMeta as DFM
+
+penguins = PalmerPenguins.load() |> DataFrame;
+@slice_head(penguins, n = 15)
+```
+
+## Filtering
+
+To filter is to keep only the rows that satisfy a certain criteria (ie. a boolean condition).
+
+To filter a dataframe in Tidier, we use the macro `@filter`. You can use it in the form
+
+```{julia}
+@filter(penguins, species == "Adelie")
+```
+
+or without parentesis as in
+
+```{julia}
+@filter penguins species == "Adelie"
+```
+
+Notice that the columns are typed as if they were variables on the Julia environment. This is inspired by the `tidyverse` behaviour of data-masking: inside a tidyverse verb, the columns are taken as "statistical variables" that exist inside the dataframe as columns.
+
+In DataFramesMeta, we have two macros for filtering: `@subset` and `@rsubset`. Use the first when you have some criteria that uses a whole column, for example:
+
+```{julia}
+DFM.@subset penguins :body_mass_g .>= mean(skipmissing(:body_mass_g))
+```
+
+Notice the broadcast on >=. We need it because *each variable is interpreted as an array (the whole column)*. Also, notice that we refer to columns as _symbols_ (i.e. we append `:` to it).
+
+In the above example, we needed the whole column `body_mass_g` to take the mean and then filter the rows based on that. If, however, your filtering criteria only uses information about each row (without needing to see it in context of the whole column), then `@rsubset` (row subset) is easier to use: it interprets each columns as a value (not an array), so no broadcasting is needed:
+
+```{julia}
+DFM.@rsubset penguins :species == "Adelie"
+```
+
+In both Tidier and DataFramesMeta, only the rows to which the criteria is `true` are returned. This means that `false` and `missing` are thrown away.
+
+In DataFrames, we use the `subset` function, and the criteria is passed with the notation
+
+```{julia}
+#| eval: false
+
+subset(penguins, :column => boolean_function)
+
+```
+
+where `boolean_function` is a boolean (with possibly `missing` values) function on 1 variable. Add the kwarg `skipmissing=true` if you want to get rid of missing values.
+
+### Filtering with one criteria
+
+Filtering all the rows with `species` = "Adelie".
+
+::: {.panel-tabset}
+
+## Tidier
+
+```{julia}
+@filter penguins species == "Adelie"
+```
+
+## DataFramesMeta
+
+```{julia}
+DFM.@rsubset penguins :species == "Adelie"
+```
+
+## DataFrames
+
+```{julia}
+subset(penguins, :species => x -> x .== "Adelie", skipmissing=true)
+```
+
+:::
+
+### Filtering with several criteria
+
+Filtering all the rows with `species` = "Adelie", `sex` = "male" and `body_mass_g` > 4000.
+
+::: {.panel-tabset}
+
+## Tidier
+
+```{julia}
+@filter penguins species == "Adelie" sex == "male" body_mass_g > 4000
+```
+
+## DataFramesMeta
+
+```{julia}
+DFM.@rsubset penguins :species == "Adelie" :sex == "male" :body_mass_g > 4000
+```
+
+## DataFrames
+
+```{julia}
+subset(penguins, [:species, :sex, :body_mass_g] => (x, y, z) -> (x .== "Adelie") .& (y .== "male") .& (z .> 4000), skipmissing=true)
+```
+
+:::
+
+
+Filtering all the rows with `species` = "Adelie" OR `sex` = "male".
+
+::: {.panel-tabset}
+
+## Tidier
+
+```{julia}
+@filter penguins (species == "Adelie") | (sex == "male")
+```
+
+## DataFramesMeta
+
+```{julia}
+DFM.@rsubset penguins (:species == "Adelie") | (:sex == "male")
+```
+
+## DataFrames
+
+```{julia}
+subset(penguins, [:species, :sex] => (x, y) -> (x .== "Adelie") .| (y .== "male"), skipmissing=true)
+```
+
+:::
+
+
+Filtering all the rows where the `flipper_length_mm` is greater than the mean.
+
+::: {.panel-tabset}
+
+## Tidier
+
+```{julia}
+@filter penguins flipper_length_mm > mean(skipmissing(flipper_length_mm))
+```
+
+## DataFramesMeta
+
+```{julia}
+DFM.@subset penguins :flipper_length_mm .>= mean(skipmissing(:flipper_length_mm))
+```
+
+## DataFrames
+
+```{julia}
+subset(penguins, :flipper_length_mm => x -> x .> mean(skipmissing(x)), skipmissing=true)
+```
+
+:::
+
+### Filtering with a variable column name
+
+Suppose the column you want to filter is a variable, let's say
+
+```{julia}
+my_column = :species
+```
+
+::: {.panel-tabset}
+
+## Tidier
+
+```{julia}
+# how to do it??
+# @filter(penguins, !!(my_column) .== "Adelie")
+```
+
+## DataFramesMeta
+
+```{julia}
+DFM.@rsubset penguins $my_column == "Adelie"
+```
+
+## DataFrames
+
+```{julia}
+subset(penguins, my_column => x -> x .== "Adelie")
+```
+
+:::
+
+In case the column is a string
+
+```{julia}
+my_column2 = "species"
+```
+
+instead of a symbol, we can write
+
+::: {.panel-tabset}
+
+## DataFramesMeta
+
+```{julia}
+DFM.@rsubset penguins $(Symbol(my_column2)) == "Adelie"
+```
+
+## DataFrames
+
+```{julia}
+subset(penguins, my_column2 => x -> x .== "Adelie")
+```
+
+:::
+
+## Arranging
+
+Arranging is when we reorder the rows of a dataframe according to some criteria.
+
+```{julia}
+@arrange penguins body_mass_g
+```
+
+```{julia}
+@arrange penguins species body_mass_g
+```
+
+```{julia}
+@arrange penguins island desc(body_mass_g)
+```
\ No newline at end of file
diff --git a/dataframes.qmd b/dataframes.qmd
index 0ce7baa..6f89bf3 100644
--- a/dataframes.qmd
+++ b/dataframes.qmd
@@ -10,6 +10,7 @@ Dataframes are one of the most important objects in data science. A dataframe is
We will use the Palmer Penguin dataset as a toy example for the remaining of the chapter.
```{julia}
+#| eval: true
using DataFrames, PalmerPenguins
using Tidier, Chain
import DataFramesMeta as DFM
@@ -25,39 +26,73 @@ DataFramesMeta is a collection of macros based on DataFrames.
Tidier is inspired by the `tidyverse` ecosystem in R. Tidier use macros to rewrite your code into DataFrames.jl code. Because of this "tidy" heritance, we will often talk about the R packages that inspired the Julia ones (like `dplyr`, `tidyr` and many others).
-In this book, whenever possible, we will show the different approaches in a tabset so you can compare them.
+In this book, whenever possible, we will show the different approaches in a tabset so you can compare them, giving more emphasis on Tidier.
:::
## Operations
-Let's start with some operations that take only one dataframe as input.^[Join operations will be dealt later.]. Here is the basic terminology:
+Let's start with some unary operations, ie. operations that take only one dataframe as input and return one dataframe as output.^[Join operations will be dealt later.]. We can divide these operations in some categories:
-- *Selecting* is when we select some columns of a dataframe, while keeping all the rows. Example: select the `species` and `sex` columns.
+### Rows operations
+
+These are operations that only affect rows, leaving all columns untouched.
- *Filtering* or *subsetting* is when we select a subset of rows based on some criteria. Example: all male penguins of species Adelie. The output is a dataframe with the exact same columns, but possibly fewer rows.
+- *Arranging* or *ordering* is when we reorder the rows of a dataframe using some criteria.
+
+### Column operations
+
+These are operations that only affect columns, leaving all rows untouched.
+
+- *Selecting* is when we select some columns of a dataframe, while keeping all the rows. Example: select the `species` and `sex` columns.
+
- *Mutating* or *transforming* is when we create new columns. Example: a new column `body_mass_kg` can be obtained dividing the column `body_mass_g` by 1000.
+### Reshaping operations
+
+These operations change the shape of a dataframe, making it wider or longer.
+
+- `Widening`
+
+- `Longering`?
+
+### Grouping operations
+
- *Grouping* is when we split the dataframe into a collection (array) of dataframes using some criteria. Example: grouping by `species` gives us 3 dataframes, each with only one species.
+### Mixed operations
+
+These operations can possibly change rows and columns at the same time.
+
+- Distinct;
+- Counting;
- *Summarising* or *combining* is when we apply some function to some columns in order to reduce the amount of rows with some kind of summary (like a mean, median, max, and so on). Example: for each `species`, apply the `mean` function to the columns `body_mass_g`. This will yield a dataframe with 3 rows, one for each species. Summarising is usually done after a grouping, so the summary is calculated with relation to each of the groups.
-- *Arranging* or *ordering* is when we reorder the rows of a dataframe using some criteria.
+??? deixar grupo e sumário juntos?
Since all these functions return a dataframe (or an array of dataframes, in the case of grouping), we can chain these operations together, with the convention that on grouped dataframes we apply the function in each one of the groups.
+Now for binary operations (ie. operations that take two dataframes), we have all the joins:
+
+- Left join;
+- Right join;
+- Inner join;
+- Outer join;
+- Full join.
+
## Comparing Tidier with DataFramesMeta
The following table list the operations on each package:
| dplyr | Tidier | DataFramesMeta | DataFrames |
|-------------|--------------|------------------------------|--------------|
+| `filter` | `@filter` | `@subset` / `@rsubset` | `subset` |
+| `arrange` | `@arrange` | `@orderby` / `@rorderby` | `sort!` |
| `select` | `@select` | `@select` | array sintax |
-| `filter` | `@filter` | `@subset` / `@rsubset` | `filter` |
| `mutate` | `@mutate` | `@transform` / `@rtransform` | array sintax |
| `group_by` | `@group_by` | `@groupby` | `groupby` |
| `summarise` | `@summarise` | `@combine` | `combine` |
-| `arrange` | `@arrange` | `@orderby` / `@rorderby` | `sort!` |
It is clear that for those coming from `R`, Tidier will look like the most natural approach.
@@ -70,6 +105,7 @@ We will see each operation with more details in the following chapters.
We can chain (or pipe) dataframe operations as follows with the `@chain` macro:
```{julia}
+#| eval: false
@chain penguins begin
@filter !ismissing(sex)
@group_by sex
diff --git a/docs/dataframes-rows.html b/docs/dataframes-rows.html
new file mode 100644
index 0000000..b60d4d3
--- /dev/null
+++ b/docs/dataframes-rows.html
@@ -0,0 +1,7745 @@
+
+
+
+
+
+
+
+
+
+1 Operations on rows – Tidier Data Science with Julia
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
usingDataFrames, PalmerPenguins
+usingTidier
+importDataFramesMeta as DFM
+
+penguins = PalmerPenguins.load() |> DataFrame;
+@slice_head(penguins, n =15)
+
+
15×7 DataFrame
+
+
+
+
Row
+
species
+
island
+
bill_length_mm
+
bill_depth_mm
+
flipper_length_mm
+
body_mass_g
+
sex
+
+
+
+
String15
+
String15
+
Float64?
+
Float64?
+
Int64?
+
Int64?
+
String7
+
+
+
+
+
1
+
Adelie
+
Torgersen
+
39.1
+
18.7
+
181
+
3750
+
male
+
+
+
2
+
Adelie
+
Torgersen
+
39.5
+
17.4
+
186
+
3800
+
female
+
+
+
3
+
Adelie
+
Torgersen
+
40.3
+
18.0
+
195
+
3250
+
female
+
+
+
4
+
Adelie
+
Torgersen
+
missing
+
missing
+
missing
+
missing
+
missing
+
+
+
5
+
Adelie
+
Torgersen
+
36.7
+
19.3
+
193
+
3450
+
female
+
+
+
6
+
Adelie
+
Torgersen
+
39.3
+
20.6
+
190
+
3650
+
male
+
+
+
7
+
Adelie
+
Torgersen
+
38.9
+
17.8
+
181
+
3625
+
female
+
+
+
8
+
Adelie
+
Torgersen
+
39.2
+
19.6
+
195
+
4675
+
male
+
+
+
9
+
Adelie
+
Torgersen
+
34.1
+
18.1
+
193
+
3475
+
missing
+
+
+
10
+
Adelie
+
Torgersen
+
42.0
+
20.2
+
190
+
4250
+
missing
+
+
+
11
+
Adelie
+
Torgersen
+
37.8
+
17.1
+
186
+
3300
+
missing
+
+
+
12
+
Adelie
+
Torgersen
+
37.8
+
17.3
+
180
+
3700
+
missing
+
+
+
13
+
Adelie
+
Torgersen
+
41.1
+
17.6
+
182
+
3200
+
female
+
+
+
14
+
Adelie
+
Torgersen
+
38.6
+
21.2
+
191
+
3800
+
male
+
+
+
15
+
Adelie
+
Torgersen
+
34.6
+
21.1
+
198
+
4400
+
male
+
+
+
+
+
+
+
+
1.1 Filtering
+
To filter is to keep only the rows that satisfy a certain criteria (ie. a boolean condition).
+
To filter a dataframe in Tidier, we use the macro @filter. You can use it in the form
+
+
@filter(penguins, species =="Adelie")
+
+
152×7 DataFrame
127 rows omitted
+
+
+
+
Row
+
species
+
island
+
bill_length_mm
+
bill_depth_mm
+
flipper_length_mm
+
body_mass_g
+
sex
+
+
+
+
String15
+
String15
+
Float64?
+
Float64?
+
Int64?
+
Int64?
+
String7
+
+
+
+
+
1
+
Adelie
+
Torgersen
+
39.1
+
18.7
+
181
+
3750
+
male
+
+
+
2
+
Adelie
+
Torgersen
+
39.5
+
17.4
+
186
+
3800
+
female
+
+
+
3
+
Adelie
+
Torgersen
+
40.3
+
18.0
+
195
+
3250
+
female
+
+
+
4
+
Adelie
+
Torgersen
+
missing
+
missing
+
missing
+
missing
+
missing
+
+
+
5
+
Adelie
+
Torgersen
+
36.7
+
19.3
+
193
+
3450
+
female
+
+
+
6
+
Adelie
+
Torgersen
+
39.3
+
20.6
+
190
+
3650
+
male
+
+
+
7
+
Adelie
+
Torgersen
+
38.9
+
17.8
+
181
+
3625
+
female
+
+
+
8
+
Adelie
+
Torgersen
+
39.2
+
19.6
+
195
+
4675
+
male
+
+
+
9
+
Adelie
+
Torgersen
+
34.1
+
18.1
+
193
+
3475
+
missing
+
+
+
10
+
Adelie
+
Torgersen
+
42.0
+
20.2
+
190
+
4250
+
missing
+
+
+
11
+
Adelie
+
Torgersen
+
37.8
+
17.1
+
186
+
3300
+
missing
+
+
+
12
+
Adelie
+
Torgersen
+
37.8
+
17.3
+
180
+
3700
+
missing
+
+
+
13
+
Adelie
+
Torgersen
+
41.1
+
17.6
+
182
+
3200
+
female
+
+
+
⋮
+
⋮
+
⋮
+
⋮
+
⋮
+
⋮
+
⋮
+
⋮
+
+
+
141
+
Adelie
+
Dream
+
40.2
+
17.1
+
193
+
3400
+
female
+
+
+
142
+
Adelie
+
Dream
+
40.6
+
17.2
+
187
+
3475
+
male
+
+
+
143
+
Adelie
+
Dream
+
32.1
+
15.5
+
188
+
3050
+
female
+
+
+
144
+
Adelie
+
Dream
+
40.7
+
17.0
+
190
+
3725
+
male
+
+
+
145
+
Adelie
+
Dream
+
37.3
+
16.8
+
192
+
3000
+
female
+
+
+
146
+
Adelie
+
Dream
+
39.0
+
18.7
+
185
+
3650
+
male
+
+
+
147
+
Adelie
+
Dream
+
39.2
+
18.6
+
190
+
4250
+
male
+
+
+
148
+
Adelie
+
Dream
+
36.6
+
18.4
+
184
+
3475
+
female
+
+
+
149
+
Adelie
+
Dream
+
36.0
+
17.8
+
195
+
3450
+
female
+
+
+
150
+
Adelie
+
Dream
+
37.8
+
18.1
+
193
+
3750
+
male
+
+
+
151
+
Adelie
+
Dream
+
36.0
+
17.1
+
187
+
3700
+
female
+
+
+
152
+
Adelie
+
Dream
+
41.5
+
18.5
+
201
+
4000
+
male
+
+
+
+
+
+
+
or without parentesis as in
+
+
@filter penguins species =="Adelie"
+
+
152×7 DataFrame
127 rows omitted
+
+
+
+
Row
+
species
+
island
+
bill_length_mm
+
bill_depth_mm
+
flipper_length_mm
+
body_mass_g
+
sex
+
+
+
+
String15
+
String15
+
Float64?
+
Float64?
+
Int64?
+
Int64?
+
String7
+
+
+
+
+
1
+
Adelie
+
Torgersen
+
39.1
+
18.7
+
181
+
3750
+
male
+
+
+
2
+
Adelie
+
Torgersen
+
39.5
+
17.4
+
186
+
3800
+
female
+
+
+
3
+
Adelie
+
Torgersen
+
40.3
+
18.0
+
195
+
3250
+
female
+
+
+
4
+
Adelie
+
Torgersen
+
missing
+
missing
+
missing
+
missing
+
missing
+
+
+
5
+
Adelie
+
Torgersen
+
36.7
+
19.3
+
193
+
3450
+
female
+
+
+
6
+
Adelie
+
Torgersen
+
39.3
+
20.6
+
190
+
3650
+
male
+
+
+
7
+
Adelie
+
Torgersen
+
38.9
+
17.8
+
181
+
3625
+
female
+
+
+
8
+
Adelie
+
Torgersen
+
39.2
+
19.6
+
195
+
4675
+
male
+
+
+
9
+
Adelie
+
Torgersen
+
34.1
+
18.1
+
193
+
3475
+
missing
+
+
+
10
+
Adelie
+
Torgersen
+
42.0
+
20.2
+
190
+
4250
+
missing
+
+
+
11
+
Adelie
+
Torgersen
+
37.8
+
17.1
+
186
+
3300
+
missing
+
+
+
12
+
Adelie
+
Torgersen
+
37.8
+
17.3
+
180
+
3700
+
missing
+
+
+
13
+
Adelie
+
Torgersen
+
41.1
+
17.6
+
182
+
3200
+
female
+
+
+
⋮
+
⋮
+
⋮
+
⋮
+
⋮
+
⋮
+
⋮
+
⋮
+
+
+
141
+
Adelie
+
Dream
+
40.2
+
17.1
+
193
+
3400
+
female
+
+
+
142
+
Adelie
+
Dream
+
40.6
+
17.2
+
187
+
3475
+
male
+
+
+
143
+
Adelie
+
Dream
+
32.1
+
15.5
+
188
+
3050
+
female
+
+
+
144
+
Adelie
+
Dream
+
40.7
+
17.0
+
190
+
3725
+
male
+
+
+
145
+
Adelie
+
Dream
+
37.3
+
16.8
+
192
+
3000
+
female
+
+
+
146
+
Adelie
+
Dream
+
39.0
+
18.7
+
185
+
3650
+
male
+
+
+
147
+
Adelie
+
Dream
+
39.2
+
18.6
+
190
+
4250
+
male
+
+
+
148
+
Adelie
+
Dream
+
36.6
+
18.4
+
184
+
3475
+
female
+
+
+
149
+
Adelie
+
Dream
+
36.0
+
17.8
+
195
+
3450
+
female
+
+
+
150
+
Adelie
+
Dream
+
37.8
+
18.1
+
193
+
3750
+
male
+
+
+
151
+
Adelie
+
Dream
+
36.0
+
17.1
+
187
+
3700
+
female
+
+
+
152
+
Adelie
+
Dream
+
41.5
+
18.5
+
201
+
4000
+
male
+
+
+
+
+
+
+
Notice that the columns are typed as if they were variables on the Julia environment. This is inspired by the tidyverse behaviour of data-masking: inside a tidyverse verb, the columns are taken as “statistical variables” that exist inside the dataframe as columns.
+
In DataFramesMeta, we have two macros for filtering: @subset and @rsubset. Use the first when you have some criteria that uses a whole column, for example:
Notice the broadcast on >=. We need it because each variable is interpreted as an array (the whole column). Also, notice that we refer to columns as symbols (i.e. we append : to it).
+
In the above example, we needed the whole column body_mass_g to take the mean and then filter the rows based on that. If, however, your filtering criteria only uses information about each row (without needing to see it in context of the whole column), then @rsubset (row subset) is easier to use: it interprets each columns as a value (not an array), so no broadcasting is needed:
+
+
DFM.@rsubset penguins :species =="Adelie"
+
+
152×7 DataFrame
127 rows omitted
+
+
+
+
Row
+
species
+
island
+
bill_length_mm
+
bill_depth_mm
+
flipper_length_mm
+
body_mass_g
+
sex
+
+
+
+
String15
+
String15
+
Float64?
+
Float64?
+
Int64?
+
Int64?
+
String7
+
+
+
+
+
1
+
Adelie
+
Torgersen
+
39.1
+
18.7
+
181
+
3750
+
male
+
+
+
2
+
Adelie
+
Torgersen
+
39.5
+
17.4
+
186
+
3800
+
female
+
+
+
3
+
Adelie
+
Torgersen
+
40.3
+
18.0
+
195
+
3250
+
female
+
+
+
4
+
Adelie
+
Torgersen
+
missing
+
missing
+
missing
+
missing
+
missing
+
+
+
5
+
Adelie
+
Torgersen
+
36.7
+
19.3
+
193
+
3450
+
female
+
+
+
6
+
Adelie
+
Torgersen
+
39.3
+
20.6
+
190
+
3650
+
male
+
+
+
7
+
Adelie
+
Torgersen
+
38.9
+
17.8
+
181
+
3625
+
female
+
+
+
8
+
Adelie
+
Torgersen
+
39.2
+
19.6
+
195
+
4675
+
male
+
+
+
9
+
Adelie
+
Torgersen
+
34.1
+
18.1
+
193
+
3475
+
missing
+
+
+
10
+
Adelie
+
Torgersen
+
42.0
+
20.2
+
190
+
4250
+
missing
+
+
+
11
+
Adelie
+
Torgersen
+
37.8
+
17.1
+
186
+
3300
+
missing
+
+
+
12
+
Adelie
+
Torgersen
+
37.8
+
17.3
+
180
+
3700
+
missing
+
+
+
13
+
Adelie
+
Torgersen
+
41.1
+
17.6
+
182
+
3200
+
female
+
+
+
⋮
+
⋮
+
⋮
+
⋮
+
⋮
+
⋮
+
⋮
+
⋮
+
+
+
141
+
Adelie
+
Dream
+
40.2
+
17.1
+
193
+
3400
+
female
+
+
+
142
+
Adelie
+
Dream
+
40.6
+
17.2
+
187
+
3475
+
male
+
+
+
143
+
Adelie
+
Dream
+
32.1
+
15.5
+
188
+
3050
+
female
+
+
+
144
+
Adelie
+
Dream
+
40.7
+
17.0
+
190
+
3725
+
male
+
+
+
145
+
Adelie
+
Dream
+
37.3
+
16.8
+
192
+
3000
+
female
+
+
+
146
+
Adelie
+
Dream
+
39.0
+
18.7
+
185
+
3650
+
male
+
+
+
147
+
Adelie
+
Dream
+
39.2
+
18.6
+
190
+
4250
+
male
+
+
+
148
+
Adelie
+
Dream
+
36.6
+
18.4
+
184
+
3475
+
female
+
+
+
149
+
Adelie
+
Dream
+
36.0
+
17.8
+
195
+
3450
+
female
+
+
+
150
+
Adelie
+
Dream
+
37.8
+
18.1
+
193
+
3750
+
male
+
+
+
151
+
Adelie
+
Dream
+
36.0
+
17.1
+
187
+
3700
+
female
+
+
+
152
+
Adelie
+
Dream
+
41.5
+
18.5
+
201
+
4000
+
male
+
+
+
+
+
+
+
In both Tidier and DataFramesMeta, only the rows to which the criteria is true are returned. This means that false and missing are thrown away.
+
In DataFrames, we use the subset function, and the criteria is passed with the notation
+
+
subset(penguins, :column => boolean_function)
+
+
where boolean_function is a boolean (with possibly missing values) function on 1 variable. Add the kwarg skipmissing=true if you want to get rid of missing values.
Dataframes.jl is the main package for dealing with dataframes in Julia. You can use it directly to manipulate tables, but we also have 2 alternatives: DataFramesMeta and Tidier.
DataFramesMeta is a collection of macros based on DataFrames.
Tidier is inspired by the tidyverse ecosystem in R. Tidier use macros to rewrite your code into DataFrames.jl code. Because of this “tidy” heritance, we will often talk about the R packages that inspired the Julia ones (like dplyr, tidyr and many others).
-
In this book, whenever possible, we will show the different approaches in a tabset so you can compare them.
+
In this book, whenever possible, we will show the different approaches in a tabset so you can compare them, giving more emphasis on Tidier.
Operations
-
Let’s start with some operations that take only one dataframe as input.1. Here is the basic terminology:
+
Let’s start with some unary operations, ie. operations that take only one dataframe as input and return one dataframe as output.1. We can divide these operations in some categories:
+
+
Rows operations
+
These are operations that only affect rows, leaving all columns untouched.
-
Selecting is when we select some columns of a dataframe, while keeping all the rows. Example: select the species and sex columns.
Filtering or subsetting is when we select a subset of rows based on some criteria. Example: all male penguins of species Adelie. The output is a dataframe with the exact same columns, but possibly fewer rows.
-
Mutating or transforming is when we create new columns. Example: a new column body_mass_kg can be obtained dividing the column body_mass_g by 1000.
-
Grouping is when we split the dataframe into a collection (array) of dataframes using some criteria. Example: grouping by species gives us 3 dataframes, each with only one species.
-
Summarising or combining is when we apply some function to some columns in order to reduce the amount of rows with some kind of summary (like a mean, median, max, and so on). Example: for each species, apply the mean function to the columns body_mass_g. This will yield a dataframe with 3 rows, one for each species. Summarising is usually done after a grouping, so the summary is calculated with relation to each of the groups.
Arranging or ordering is when we reorder the rows of a dataframe using some criteria.
+
+
+
Column operations
+
These are operations that only affect columns, leaving all rows untouched.
+
+
Selecting is when we select some columns of a dataframe, while keeping all the rows. Example: select the species and sex columns.
+
Mutating or transforming is when we create new columns. Example: a new column body_mass_kg can be obtained dividing the column body_mass_g by 1000.
+
+
+
+
Reshaping operations
+
These operations change the shape of a dataframe, making it wider or longer.
+
+
Widening
+
Longering?
+
+
+
+
Grouping operations
+
+
Grouping is when we split the dataframe into a collection (array) of dataframes using some criteria. Example: grouping by species gives us 3 dataframes, each with only one species.
+
+
+
+
Mixed operations
+
These operations can possibly change rows and columns at the same time.
+
+
Distinct;
+
Counting;
+
Summarising or combining is when we apply some function to some columns in order to reduce the amount of rows with some kind of summary (like a mean, median, max, and so on). Example: for each species, apply the mean function to the columns body_mass_g. This will yield a dataframe with 3 rows, one for each species. Summarising is usually done after a grouping, so the summary is calculated with relation to each of the groups.
+
+
??? deixar grupo e sumário juntos?
Since all these functions return a dataframe (or an array of dataframes, in the case of grouping), we can chain these operations together, with the convention that on grouped dataframes we apply the function in each one of the groups.
-
Let’s see each operation with more details.
+
Now for binary operations (ie. operations that take two dataframes), we have all the joins:
+
+
Left join;
+
Right join;
+
Inner join;
+
Outer join;
+
Full join.
+
+
Comparing Tidier with DataFramesMeta
@@ -586,44 +633,58 @@
Compa
+
filter
+
@filter
+
@subset / @rsubset
+
subset
+
+
+
arrange
+
@arrange
+
@orderby / @rorderby
+
sort!
+
+
select
@select
@select
array sintax
-
filter
-
@filter
-
@subset / @rsubset
-
filter
-
-
mutate
@mutate
@transform / @rtransform
array sintax
-
+
group_by
@group_by
@groupby
groupby
-
+
summarise
@summarise
@combine
combine
-
-
arrange
-
@arrange
-
@orderby / @rorderby
-
sort!
-
+
It is clear that for those coming from R, Tidier will look like the most natural approach.
Notice that we have a name clash with @select: that is why we import DataFramesMeta as DFM at the beginning.
+
We will see each operation with more details in the following chapters.
+
+
+
Chaining operations
+
We can chain (or pipe) dataframe operations as follows with the @chain macro:
+
+
@chain penguins begin
+@filter !ismissing(sex)
+@group_by sex
+@summarise mean =mean(bill_length_mm)
+@arrange mean
+end
+
diff --git a/docs/search.json b/docs/search.json
index 321a157..ad0dffc 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -174,7 +174,7 @@
"href": "dataframes.html",
"title": "Part 2: Dataframes",
"section": "",
- "text": "Operations\nLet’s start with some operations that take only one dataframe as input.1. Here is the basic terminology:\nSince all these functions return a dataframe (or an array of dataframes, in the case of grouping), we can chain these operations together, with the convention that on grouped dataframes we apply the function in each one of the groups.\nLet’s see each operation with more details.",
+ "text": "Operations\nLet’s start with some unary operations, ie. operations that take only one dataframe as input and return one dataframe as output.1. We can divide these operations in some categories:",
"crumbs": [
"Part 2: Dataframes"
]
@@ -206,7 +206,7 @@
"href": "dataframes.html#operations",
"title": "Part 2: Dataframes",
"section": "",
- "text": "Selecting is when we select some columns of a dataframe, while keeping all the rows. Example: select the species and sex columns.\nFiltering or subsetting is when we select a subset of rows based on some criteria. Example: all male penguins of species Adelie. The output is a dataframe with the exact same columns, but possibly fewer rows.\nMutating or transforming is when we create new columns. Example: a new column body_mass_kg can be obtained dividing the column body_mass_g by 1000.\nGrouping is when we split the dataframe into a collection (array) of dataframes using some criteria. Example: grouping by species gives us 3 dataframes, each with only one species.\nSummarising or combining is when we apply some function to some columns in order to reduce the amount of rows with some kind of summary (like a mean, median, max, and so on). Example: for each species, apply the mean function to the columns body_mass_g. This will yield a dataframe with 3 rows, one for each species. Summarising is usually done after a grouping, so the summary is calculated with relation to each of the groups.\nArranging or ordering is when we reorder the rows of a dataframe using some criteria.",
+ "text": "Rows operations\nThese are operations that only affect rows, leaving all columns untouched.\n\nFiltering or subsetting is when we select a subset of rows based on some criteria. Example: all male penguins of species Adelie. The output is a dataframe with the exact same columns, but possibly fewer rows.\nArranging or ordering is when we reorder the rows of a dataframe using some criteria.\n\n\n\nColumn operations\nThese are operations that only affect columns, leaving all rows untouched.\n\nSelecting is when we select some columns of a dataframe, while keeping all the rows. Example: select the species and sex columns.\nMutating or transforming is when we create new columns. Example: a new column body_mass_kg can be obtained dividing the column body_mass_g by 1000.\n\n\n\nReshaping operations\nThese operations change the shape of a dataframe, making it wider or longer.\n\nWidening\nLongering?\n\n\n\nGrouping operations\n\nGrouping is when we split the dataframe into a collection (array) of dataframes using some criteria. Example: grouping by species gives us 3 dataframes, each with only one species.\n\n\n\nMixed operations\nThese operations can possibly change rows and columns at the same time.\n\nDistinct;\nCounting;\nSummarising or combining is when we apply some function to some columns in order to reduce the amount of rows with some kind of summary (like a mean, median, max, and so on). Example: for each species, apply the mean function to the columns body_mass_g. This will yield a dataframe with 3 rows, one for each species. Summarising is usually done after a grouping, so the summary is calculated with relation to each of the groups.\n\n??? deixar grupo e sumário juntos?\nSince all these functions return a dataframe (or an array of dataframes, in the case of grouping), we can chain these operations together, with the convention that on grouped dataframes we apply the function in each one of the groups.\nNow for binary operations (ie. operations that take two dataframes), we have all the joins:\n\nLeft join;\nRight join;\nInner join;\nOuter join;\nFull join.",
"crumbs": [
"Part 2: Dataframes"
]
@@ -216,7 +216,7 @@
"href": "dataframes.html#comparing-tidier-with-dataframesmeta",
"title": "Part 2: Dataframes",
"section": "Comparing Tidier with DataFramesMeta",
- "text": "Comparing Tidier with DataFramesMeta\nThe following table list the operations on each package:\n\n\n\n\n\n\n\n\n\ndplyr\nTidier\nDataFramesMeta\nDataFrames\n\n\n\n\nselect\n@select\n@select\narray sintax\n\n\nfilter\n@filter\n@subset / @rsubset\nfilter\n\n\nmutate\n@mutate\n@transform / @rtransform\narray sintax\n\n\ngroup_by\n@group_by\n@groupby\ngroupby\n\n\nsummarise\n@summarise\n@combine\ncombine\n\n\narrange\n@arrange\n@orderby / @rorderby\nsort!\n\n\n\nNotice that we have a name clash with @select: that is why we import DataFramesMeta as DFM at the beginning.",
+ "text": "Comparing Tidier with DataFramesMeta\nThe following table list the operations on each package:\n\n\n\n\n\n\n\n\n\ndplyr\nTidier\nDataFramesMeta\nDataFrames\n\n\n\n\nfilter\n@filter\n@subset / @rsubset\nsubset\n\n\narrange\n@arrange\n@orderby / @rorderby\nsort!\n\n\nselect\n@select\n@select\narray sintax\n\n\nmutate\n@mutate\n@transform / @rtransform\narray sintax\n\n\ngroup_by\n@group_by\n@groupby\ngroupby\n\n\nsummarise\n@summarise\n@combine\ncombine\n\n\n\nIt is clear that for those coming from R, Tidier will look like the most natural approach.\nNotice that we have a name clash with @select: that is why we import DataFramesMeta as DFM at the beginning.\nWe will see each operation with more details in the following chapters.",
"crumbs": [
"Part 2: Dataframes"
]
@@ -274,5 +274,48 @@
"Part 2: Dataframes",
"1Filtering"
]
+ },
+ {
+ "objectID": "dataframes.html#chaining-operations",
+ "href": "dataframes.html#chaining-operations",
+ "title": "Part 2: Dataframes",
+ "section": "Chaining operations",
+ "text": "Chaining operations\nWe can chain (or pipe) dataframe operations as follows with the @chain macro:\n\n@chain penguins begin\n @filter !ismissing(sex)\n @group_by sex\n @summarise mean = mean(bill_length_mm)\n @arrange mean\nend",
+ "crumbs": [
+ "Part 2: Dataframes"
+ ]
+ },
+ {
+ "objectID": "dataframes-rows.html",
+ "href": "dataframes-rows.html",
+ "title": "1 Operations on rows",
+ "section": "",
+ "text": "1.1 Filtering\nTo filter is to keep only the rows that satisfy a certain criteria (ie. a boolean condition).\nTo filter a dataframe in Tidier, we use the macro @filter. You can use it in the form\n@filter(penguins, species == \"Adelie\")\n\n152×7 DataFrame127 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n\n\n2\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n\n\n3\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n\n\n4\nAdelie\nTorgersen\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n5\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n\n\n6\nAdelie\nTorgersen\n39.3\n20.6\n190\n3650\nmale\n\n\n7\nAdelie\nTorgersen\n38.9\n17.8\n181\n3625\nfemale\n\n\n8\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n9\nAdelie\nTorgersen\n34.1\n18.1\n193\n3475\nmissing\n\n\n10\nAdelie\nTorgersen\n42.0\n20.2\n190\n4250\nmissing\n\n\n11\nAdelie\nTorgersen\n37.8\n17.1\n186\n3300\nmissing\n\n\n12\nAdelie\nTorgersen\n37.8\n17.3\n180\n3700\nmissing\n\n\n13\nAdelie\nTorgersen\n41.1\n17.6\n182\n3200\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n141\nAdelie\nDream\n40.2\n17.1\n193\n3400\nfemale\n\n\n142\nAdelie\nDream\n40.6\n17.2\n187\n3475\nmale\n\n\n143\nAdelie\nDream\n32.1\n15.5\n188\n3050\nfemale\n\n\n144\nAdelie\nDream\n40.7\n17.0\n190\n3725\nmale\n\n\n145\nAdelie\nDream\n37.3\n16.8\n192\n3000\nfemale\n\n\n146\nAdelie\nDream\n39.0\n18.7\n185\n3650\nmale\n\n\n147\nAdelie\nDream\n39.2\n18.6\n190\n4250\nmale\n\n\n148\nAdelie\nDream\n36.6\n18.4\n184\n3475\nfemale\n\n\n149\nAdelie\nDream\n36.0\n17.8\n195\n3450\nfemale\n\n\n150\nAdelie\nDream\n37.8\n18.1\n193\n3750\nmale\n\n\n151\nAdelie\nDream\n36.0\n17.1\n187\n3700\nfemale\n\n\n152\nAdelie\nDream\n41.5\n18.5\n201\n4000\nmale\nor without parentesis as in\n@filter penguins species == \"Adelie\"\n\n152×7 DataFrame127 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n\n\n2\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n\n\n3\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n\n\n4\nAdelie\nTorgersen\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n5\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n\n\n6\nAdelie\nTorgersen\n39.3\n20.6\n190\n3650\nmale\n\n\n7\nAdelie\nTorgersen\n38.9\n17.8\n181\n3625\nfemale\n\n\n8\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n9\nAdelie\nTorgersen\n34.1\n18.1\n193\n3475\nmissing\n\n\n10\nAdelie\nTorgersen\n42.0\n20.2\n190\n4250\nmissing\n\n\n11\nAdelie\nTorgersen\n37.8\n17.1\n186\n3300\nmissing\n\n\n12\nAdelie\nTorgersen\n37.8\n17.3\n180\n3700\nmissing\n\n\n13\nAdelie\nTorgersen\n41.1\n17.6\n182\n3200\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n141\nAdelie\nDream\n40.2\n17.1\n193\n3400\nfemale\n\n\n142\nAdelie\nDream\n40.6\n17.2\n187\n3475\nmale\n\n\n143\nAdelie\nDream\n32.1\n15.5\n188\n3050\nfemale\n\n\n144\nAdelie\nDream\n40.7\n17.0\n190\n3725\nmale\n\n\n145\nAdelie\nDream\n37.3\n16.8\n192\n3000\nfemale\n\n\n146\nAdelie\nDream\n39.0\n18.7\n185\n3650\nmale\n\n\n147\nAdelie\nDream\n39.2\n18.6\n190\n4250\nmale\n\n\n148\nAdelie\nDream\n36.6\n18.4\n184\n3475\nfemale\n\n\n149\nAdelie\nDream\n36.0\n17.8\n195\n3450\nfemale\n\n\n150\nAdelie\nDream\n37.8\n18.1\n193\n3750\nmale\n\n\n151\nAdelie\nDream\n36.0\n17.1\n187\n3700\nfemale\n\n\n152\nAdelie\nDream\n41.5\n18.5\n201\n4000\nmale\nNotice that the columns are typed as if they were variables on the Julia environment. This is inspired by the tidyverse behaviour of data-masking: inside a tidyverse verb, the columns are taken as “statistical variables” that exist inside the dataframe as columns.\nIn DataFramesMeta, we have two macros for filtering: @subset and @rsubset. Use the first when you have some criteria that uses a whole column, for example:\nDFM.@subset penguins :body_mass_g .>= mean(skipmissing(:body_mass_g))\n\n149×7 DataFrame124 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n2\nAdelie\nTorgersen\n42.0\n20.2\n190\n4250\nmissing\n\n\n3\nAdelie\nTorgersen\n34.6\n21.1\n198\n4400\nmale\n\n\n4\nAdelie\nTorgersen\n42.5\n20.7\n197\n4500\nmale\n\n\n5\nAdelie\nDream\n39.8\n19.1\n184\n4650\nmale\n\n\n6\nAdelie\nDream\n44.1\n19.7\n196\n4400\nmale\n\n\n7\nAdelie\nDream\n39.6\n18.8\n190\n4600\nmale\n\n\n8\nAdelie\nBiscoe\n40.1\n18.9\n188\n4300\nmale\n\n\n9\nAdelie\nBiscoe\n41.3\n21.1\n195\n4400\nmale\n\n\n10\nAdelie\nTorgersen\n41.8\n19.4\n198\n4450\nmale\n\n\n11\nAdelie\nTorgersen\n42.8\n18.5\n195\n4250\nmale\n\n\n12\nAdelie\nTorgersen\n42.9\n17.6\n196\n4700\nmale\n\n\n13\nAdelie\nDream\n41.1\n18.1\n205\n4300\nmale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n138\nGentoo\nBiscoe\n47.2\n13.7\n214\n4925\nfemale\n\n\n139\nGentoo\nBiscoe\n46.8\n14.3\n215\n4850\nfemale\n\n\n140\nGentoo\nBiscoe\n50.4\n15.7\n222\n5750\nmale\n\n\n141\nGentoo\nBiscoe\n45.2\n14.8\n212\n5200\nfemale\n\n\n142\nGentoo\nBiscoe\n49.9\n16.1\n213\n5400\nmale\n\n\n143\nChinstrap\nDream\n49.2\n18.2\n195\n4400\nmale\n\n\n144\nChinstrap\nDream\n52.8\n20.0\n205\n4550\nmale\n\n\n145\nChinstrap\nDream\n54.2\n20.8\n201\n4300\nmale\n\n\n146\nChinstrap\nDream\n52.0\n20.7\n210\n4800\nmale\n\n\n147\nChinstrap\nDream\n53.5\n19.9\n205\n4500\nmale\n\n\n148\nChinstrap\nDream\n50.8\n18.5\n201\n4450\nmale\n\n\n149\nChinstrap\nDream\n49.0\n19.6\n212\n4300\nmale\nNotice the broadcast on >=. We need it because each variable is interpreted as an array (the whole column). Also, notice that we refer to columns as symbols (i.e. we append : to it).\nIn the above example, we needed the whole column body_mass_g to take the mean and then filter the rows based on that. If, however, your filtering criteria only uses information about each row (without needing to see it in context of the whole column), then @rsubset (row subset) is easier to use: it interprets each columns as a value (not an array), so no broadcasting is needed:\nDFM.@rsubset penguins :species == \"Adelie\"\n\n152×7 DataFrame127 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n\n\n2\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n\n\n3\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n\n\n4\nAdelie\nTorgersen\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n5\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n\n\n6\nAdelie\nTorgersen\n39.3\n20.6\n190\n3650\nmale\n\n\n7\nAdelie\nTorgersen\n38.9\n17.8\n181\n3625\nfemale\n\n\n8\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n9\nAdelie\nTorgersen\n34.1\n18.1\n193\n3475\nmissing\n\n\n10\nAdelie\nTorgersen\n42.0\n20.2\n190\n4250\nmissing\n\n\n11\nAdelie\nTorgersen\n37.8\n17.1\n186\n3300\nmissing\n\n\n12\nAdelie\nTorgersen\n37.8\n17.3\n180\n3700\nmissing\n\n\n13\nAdelie\nTorgersen\n41.1\n17.6\n182\n3200\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n141\nAdelie\nDream\n40.2\n17.1\n193\n3400\nfemale\n\n\n142\nAdelie\nDream\n40.6\n17.2\n187\n3475\nmale\n\n\n143\nAdelie\nDream\n32.1\n15.5\n188\n3050\nfemale\n\n\n144\nAdelie\nDream\n40.7\n17.0\n190\n3725\nmale\n\n\n145\nAdelie\nDream\n37.3\n16.8\n192\n3000\nfemale\n\n\n146\nAdelie\nDream\n39.0\n18.7\n185\n3650\nmale\n\n\n147\nAdelie\nDream\n39.2\n18.6\n190\n4250\nmale\n\n\n148\nAdelie\nDream\n36.6\n18.4\n184\n3475\nfemale\n\n\n149\nAdelie\nDream\n36.0\n17.8\n195\n3450\nfemale\n\n\n150\nAdelie\nDream\n37.8\n18.1\n193\n3750\nmale\n\n\n151\nAdelie\nDream\n36.0\n17.1\n187\n3700\nfemale\n\n\n152\nAdelie\nDream\n41.5\n18.5\n201\n4000\nmale\nIn both Tidier and DataFramesMeta, only the rows to which the criteria is true are returned. This means that false and missing are thrown away.\nIn DataFrames, we use the subset function, and the criteria is passed with the notation\nsubset(penguins, :column => boolean_function)\nwhere boolean_function is a boolean (with possibly missing values) function on 1 variable. Add the kwarg skipmissing=true if you want to get rid of missing values.",
+ "crumbs": [
+ "Part 2: Dataframes",
+ "1Operations on rows"
+ ]
+ },
+ {
+ "objectID": "dataframes-rows.html#filtering",
+ "href": "dataframes-rows.html#filtering",
+ "title": "1 Operations on rows",
+ "section": "",
+ "text": "1.1.1 Filtering with one criteria\nFiltering all the rows with species = “Adelie”.\n\nTidierDataFramesMetaDataFrames\n\n\n\n@filter penguins species == \"Adelie\"\n\n152×7 DataFrame127 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n\n\n2\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n\n\n3\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n\n\n4\nAdelie\nTorgersen\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n5\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n\n\n6\nAdelie\nTorgersen\n39.3\n20.6\n190\n3650\nmale\n\n\n7\nAdelie\nTorgersen\n38.9\n17.8\n181\n3625\nfemale\n\n\n8\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n9\nAdelie\nTorgersen\n34.1\n18.1\n193\n3475\nmissing\n\n\n10\nAdelie\nTorgersen\n42.0\n20.2\n190\n4250\nmissing\n\n\n11\nAdelie\nTorgersen\n37.8\n17.1\n186\n3300\nmissing\n\n\n12\nAdelie\nTorgersen\n37.8\n17.3\n180\n3700\nmissing\n\n\n13\nAdelie\nTorgersen\n41.1\n17.6\n182\n3200\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n141\nAdelie\nDream\n40.2\n17.1\n193\n3400\nfemale\n\n\n142\nAdelie\nDream\n40.6\n17.2\n187\n3475\nmale\n\n\n143\nAdelie\nDream\n32.1\n15.5\n188\n3050\nfemale\n\n\n144\nAdelie\nDream\n40.7\n17.0\n190\n3725\nmale\n\n\n145\nAdelie\nDream\n37.3\n16.8\n192\n3000\nfemale\n\n\n146\nAdelie\nDream\n39.0\n18.7\n185\n3650\nmale\n\n\n147\nAdelie\nDream\n39.2\n18.6\n190\n4250\nmale\n\n\n148\nAdelie\nDream\n36.6\n18.4\n184\n3475\nfemale\n\n\n149\nAdelie\nDream\n36.0\n17.8\n195\n3450\nfemale\n\n\n150\nAdelie\nDream\n37.8\n18.1\n193\n3750\nmale\n\n\n151\nAdelie\nDream\n36.0\n17.1\n187\n3700\nfemale\n\n\n152\nAdelie\nDream\n41.5\n18.5\n201\n4000\nmale\n\n\n\n\n\n\n\n\n\nDFM.@rsubset penguins :species == \"Adelie\"\n\n152×7 DataFrame127 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n\n\n2\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n\n\n3\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n\n\n4\nAdelie\nTorgersen\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n5\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n\n\n6\nAdelie\nTorgersen\n39.3\n20.6\n190\n3650\nmale\n\n\n7\nAdelie\nTorgersen\n38.9\n17.8\n181\n3625\nfemale\n\n\n8\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n9\nAdelie\nTorgersen\n34.1\n18.1\n193\n3475\nmissing\n\n\n10\nAdelie\nTorgersen\n42.0\n20.2\n190\n4250\nmissing\n\n\n11\nAdelie\nTorgersen\n37.8\n17.1\n186\n3300\nmissing\n\n\n12\nAdelie\nTorgersen\n37.8\n17.3\n180\n3700\nmissing\n\n\n13\nAdelie\nTorgersen\n41.1\n17.6\n182\n3200\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n141\nAdelie\nDream\n40.2\n17.1\n193\n3400\nfemale\n\n\n142\nAdelie\nDream\n40.6\n17.2\n187\n3475\nmale\n\n\n143\nAdelie\nDream\n32.1\n15.5\n188\n3050\nfemale\n\n\n144\nAdelie\nDream\n40.7\n17.0\n190\n3725\nmale\n\n\n145\nAdelie\nDream\n37.3\n16.8\n192\n3000\nfemale\n\n\n146\nAdelie\nDream\n39.0\n18.7\n185\n3650\nmale\n\n\n147\nAdelie\nDream\n39.2\n18.6\n190\n4250\nmale\n\n\n148\nAdelie\nDream\n36.6\n18.4\n184\n3475\nfemale\n\n\n149\nAdelie\nDream\n36.0\n17.8\n195\n3450\nfemale\n\n\n150\nAdelie\nDream\n37.8\n18.1\n193\n3750\nmale\n\n\n151\nAdelie\nDream\n36.0\n17.1\n187\n3700\nfemale\n\n\n152\nAdelie\nDream\n41.5\n18.5\n201\n4000\nmale\n\n\n\n\n\n\n\n\n\nsubset(penguins, :species => x -> x .== \"Adelie\", skipmissing=true)\n\n152×7 DataFrame127 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n\n\n2\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n\n\n3\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n\n\n4\nAdelie\nTorgersen\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n5\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n\n\n6\nAdelie\nTorgersen\n39.3\n20.6\n190\n3650\nmale\n\n\n7\nAdelie\nTorgersen\n38.9\n17.8\n181\n3625\nfemale\n\n\n8\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n9\nAdelie\nTorgersen\n34.1\n18.1\n193\n3475\nmissing\n\n\n10\nAdelie\nTorgersen\n42.0\n20.2\n190\n4250\nmissing\n\n\n11\nAdelie\nTorgersen\n37.8\n17.1\n186\n3300\nmissing\n\n\n12\nAdelie\nTorgersen\n37.8\n17.3\n180\n3700\nmissing\n\n\n13\nAdelie\nTorgersen\n41.1\n17.6\n182\n3200\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n141\nAdelie\nDream\n40.2\n17.1\n193\n3400\nfemale\n\n\n142\nAdelie\nDream\n40.6\n17.2\n187\n3475\nmale\n\n\n143\nAdelie\nDream\n32.1\n15.5\n188\n3050\nfemale\n\n\n144\nAdelie\nDream\n40.7\n17.0\n190\n3725\nmale\n\n\n145\nAdelie\nDream\n37.3\n16.8\n192\n3000\nfemale\n\n\n146\nAdelie\nDream\n39.0\n18.7\n185\n3650\nmale\n\n\n147\nAdelie\nDream\n39.2\n18.6\n190\n4250\nmale\n\n\n148\nAdelie\nDream\n36.6\n18.4\n184\n3475\nfemale\n\n\n149\nAdelie\nDream\n36.0\n17.8\n195\n3450\nfemale\n\n\n150\nAdelie\nDream\n37.8\n18.1\n193\n3750\nmale\n\n\n151\nAdelie\nDream\n36.0\n17.1\n187\n3700\nfemale\n\n\n152\nAdelie\nDream\n41.5\n18.5\n201\n4000\nmale\n\n\n\n\n\n\n\n\n\n\n\n1.1.2 Filtering with several criteria\nFiltering all the rows with species = “Adelie”, sex = “male” and body_mass_g > 4000.\n\nTidierDataFramesMetaDataFrames\n\n\n\n@filter penguins species == \"Adelie\" sex == \"male\" body_mass_g > 4000\n\n34×7 DataFrame9 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n2\nAdelie\nTorgersen\n34.6\n21.1\n198\n4400\nmale\n\n\n3\nAdelie\nTorgersen\n42.5\n20.7\n197\n4500\nmale\n\n\n4\nAdelie\nTorgersen\n46.0\n21.5\n194\n4200\nmale\n\n\n5\nAdelie\nDream\n39.2\n21.1\n196\n4150\nmale\n\n\n6\nAdelie\nDream\n39.8\n19.1\n184\n4650\nmale\n\n\n7\nAdelie\nDream\n44.1\n19.7\n196\n4400\nmale\n\n\n8\nAdelie\nDream\n39.6\n18.8\n190\n4600\nmale\n\n\n9\nAdelie\nDream\n42.3\n21.2\n191\n4150\nmale\n\n\n10\nAdelie\nBiscoe\n40.1\n18.9\n188\n4300\nmale\n\n\n11\nAdelie\nBiscoe\n42.0\n19.5\n200\n4050\nmale\n\n\n12\nAdelie\nBiscoe\n41.3\n21.1\n195\n4400\nmale\n\n\n13\nAdelie\nBiscoe\n41.1\n18.2\n192\n4050\nmale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n23\nAdelie\nDream\n40.3\n18.5\n196\n4350\nmale\n\n\n24\nAdelie\nDream\n43.2\n18.5\n192\n4100\nmale\n\n\n25\nAdelie\nBiscoe\n41.0\n20.0\n203\n4725\nmale\n\n\n26\nAdelie\nBiscoe\n37.8\n20.0\n190\n4250\nmale\n\n\n27\nAdelie\nBiscoe\n43.2\n19.0\n197\n4775\nmale\n\n\n28\nAdelie\nBiscoe\n45.6\n20.3\n191\n4600\nmale\n\n\n29\nAdelie\nBiscoe\n42.2\n19.5\n197\n4275\nmale\n\n\n30\nAdelie\nBiscoe\n42.7\n18.3\n196\n4075\nmale\n\n\n31\nAdelie\nTorgersen\n41.5\n18.3\n195\n4300\nmale\n\n\n32\nAdelie\nDream\n37.5\n18.5\n199\n4475\nmale\n\n\n33\nAdelie\nDream\n39.7\n17.9\n193\n4250\nmale\n\n\n34\nAdelie\nDream\n39.2\n18.6\n190\n4250\nmale\n\n\n\n\n\n\n\n\n\nDFM.@rsubset penguins :species == \"Adelie\" :sex == \"male\" :body_mass_g > 4000\n\n34×7 DataFrame9 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n2\nAdelie\nTorgersen\n34.6\n21.1\n198\n4400\nmale\n\n\n3\nAdelie\nTorgersen\n42.5\n20.7\n197\n4500\nmale\n\n\n4\nAdelie\nTorgersen\n46.0\n21.5\n194\n4200\nmale\n\n\n5\nAdelie\nDream\n39.2\n21.1\n196\n4150\nmale\n\n\n6\nAdelie\nDream\n39.8\n19.1\n184\n4650\nmale\n\n\n7\nAdelie\nDream\n44.1\n19.7\n196\n4400\nmale\n\n\n8\nAdelie\nDream\n39.6\n18.8\n190\n4600\nmale\n\n\n9\nAdelie\nDream\n42.3\n21.2\n191\n4150\nmale\n\n\n10\nAdelie\nBiscoe\n40.1\n18.9\n188\n4300\nmale\n\n\n11\nAdelie\nBiscoe\n42.0\n19.5\n200\n4050\nmale\n\n\n12\nAdelie\nBiscoe\n41.3\n21.1\n195\n4400\nmale\n\n\n13\nAdelie\nBiscoe\n41.1\n18.2\n192\n4050\nmale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n23\nAdelie\nDream\n40.3\n18.5\n196\n4350\nmale\n\n\n24\nAdelie\nDream\n43.2\n18.5\n192\n4100\nmale\n\n\n25\nAdelie\nBiscoe\n41.0\n20.0\n203\n4725\nmale\n\n\n26\nAdelie\nBiscoe\n37.8\n20.0\n190\n4250\nmale\n\n\n27\nAdelie\nBiscoe\n43.2\n19.0\n197\n4775\nmale\n\n\n28\nAdelie\nBiscoe\n45.6\n20.3\n191\n4600\nmale\n\n\n29\nAdelie\nBiscoe\n42.2\n19.5\n197\n4275\nmale\n\n\n30\nAdelie\nBiscoe\n42.7\n18.3\n196\n4075\nmale\n\n\n31\nAdelie\nTorgersen\n41.5\n18.3\n195\n4300\nmale\n\n\n32\nAdelie\nDream\n37.5\n18.5\n199\n4475\nmale\n\n\n33\nAdelie\nDream\n39.7\n17.9\n193\n4250\nmale\n\n\n34\nAdelie\nDream\n39.2\n18.6\n190\n4250\nmale\n\n\n\n\n\n\n\n\n\nsubset(penguins, [:species, :sex, :body_mass_g] => (x, y, z) -> (x .== \"Adelie\") .& (y .== \"male\") .& (z .> 4000), skipmissing=true)\n\n34×7 DataFrame9 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n2\nAdelie\nTorgersen\n34.6\n21.1\n198\n4400\nmale\n\n\n3\nAdelie\nTorgersen\n42.5\n20.7\n197\n4500\nmale\n\n\n4\nAdelie\nTorgersen\n46.0\n21.5\n194\n4200\nmale\n\n\n5\nAdelie\nDream\n39.2\n21.1\n196\n4150\nmale\n\n\n6\nAdelie\nDream\n39.8\n19.1\n184\n4650\nmale\n\n\n7\nAdelie\nDream\n44.1\n19.7\n196\n4400\nmale\n\n\n8\nAdelie\nDream\n39.6\n18.8\n190\n4600\nmale\n\n\n9\nAdelie\nDream\n42.3\n21.2\n191\n4150\nmale\n\n\n10\nAdelie\nBiscoe\n40.1\n18.9\n188\n4300\nmale\n\n\n11\nAdelie\nBiscoe\n42.0\n19.5\n200\n4050\nmale\n\n\n12\nAdelie\nBiscoe\n41.3\n21.1\n195\n4400\nmale\n\n\n13\nAdelie\nBiscoe\n41.1\n18.2\n192\n4050\nmale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n23\nAdelie\nDream\n40.3\n18.5\n196\n4350\nmale\n\n\n24\nAdelie\nDream\n43.2\n18.5\n192\n4100\nmale\n\n\n25\nAdelie\nBiscoe\n41.0\n20.0\n203\n4725\nmale\n\n\n26\nAdelie\nBiscoe\n37.8\n20.0\n190\n4250\nmale\n\n\n27\nAdelie\nBiscoe\n43.2\n19.0\n197\n4775\nmale\n\n\n28\nAdelie\nBiscoe\n45.6\n20.3\n191\n4600\nmale\n\n\n29\nAdelie\nBiscoe\n42.2\n19.5\n197\n4275\nmale\n\n\n30\nAdelie\nBiscoe\n42.7\n18.3\n196\n4075\nmale\n\n\n31\nAdelie\nTorgersen\n41.5\n18.3\n195\n4300\nmale\n\n\n32\nAdelie\nDream\n37.5\n18.5\n199\n4475\nmale\n\n\n33\nAdelie\nDream\n39.7\n17.9\n193\n4250\nmale\n\n\n34\nAdelie\nDream\n39.2\n18.6\n190\n4250\nmale\n\n\n\n\n\n\n\n\n\nFiltering all the rows with species = “Adelie” OR sex = “male”.\n\nTidierDataFramesMetaDataFrames\n\n\n\n@filter penguins (species == \"Adelie\") | (sex == \"male\")\n\n247×7 DataFrame222 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n\n\n2\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n\n\n3\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n\n\n4\nAdelie\nTorgersen\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n5\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n\n\n6\nAdelie\nTorgersen\n39.3\n20.6\n190\n3650\nmale\n\n\n7\nAdelie\nTorgersen\n38.9\n17.8\n181\n3625\nfemale\n\n\n8\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n9\nAdelie\nTorgersen\n34.1\n18.1\n193\n3475\nmissing\n\n\n10\nAdelie\nTorgersen\n42.0\n20.2\n190\n4250\nmissing\n\n\n11\nAdelie\nTorgersen\n37.8\n17.1\n186\n3300\nmissing\n\n\n12\nAdelie\nTorgersen\n37.8\n17.3\n180\n3700\nmissing\n\n\n13\nAdelie\nTorgersen\n41.1\n17.6\n182\n3200\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n236\nChinstrap\nDream\n50.8\n18.5\n201\n4450\nmale\n\n\n237\nChinstrap\nDream\n49.0\n19.6\n212\n4300\nmale\n\n\n238\nChinstrap\nDream\n51.5\n18.7\n187\n3250\nmale\n\n\n239\nChinstrap\nDream\n51.4\n19.0\n201\n3950\nmale\n\n\n240\nChinstrap\nDream\n50.7\n19.7\n203\n4050\nmale\n\n\n241\nChinstrap\nDream\n52.2\n18.8\n197\n3450\nmale\n\n\n242\nChinstrap\nDream\n49.3\n19.9\n203\n4050\nmale\n\n\n243\nChinstrap\nDream\n50.2\n18.8\n202\n3800\nmale\n\n\n244\nChinstrap\nDream\n51.9\n19.5\n206\n3950\nmale\n\n\n245\nChinstrap\nDream\n55.8\n19.8\n207\n4000\nmale\n\n\n246\nChinstrap\nDream\n49.6\n18.2\n193\n3775\nmale\n\n\n247\nChinstrap\nDream\n50.8\n19.0\n210\n4100\nmale\n\n\n\n\n\n\n\n\n\nDFM.@rsubset penguins (:species == \"Adelie\") | (:sex == \"male\")\n\n247×7 DataFrame222 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n\n\n2\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n\n\n3\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n\n\n4\nAdelie\nTorgersen\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n5\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n\n\n6\nAdelie\nTorgersen\n39.3\n20.6\n190\n3650\nmale\n\n\n7\nAdelie\nTorgersen\n38.9\n17.8\n181\n3625\nfemale\n\n\n8\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n9\nAdelie\nTorgersen\n34.1\n18.1\n193\n3475\nmissing\n\n\n10\nAdelie\nTorgersen\n42.0\n20.2\n190\n4250\nmissing\n\n\n11\nAdelie\nTorgersen\n37.8\n17.1\n186\n3300\nmissing\n\n\n12\nAdelie\nTorgersen\n37.8\n17.3\n180\n3700\nmissing\n\n\n13\nAdelie\nTorgersen\n41.1\n17.6\n182\n3200\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n236\nChinstrap\nDream\n50.8\n18.5\n201\n4450\nmale\n\n\n237\nChinstrap\nDream\n49.0\n19.6\n212\n4300\nmale\n\n\n238\nChinstrap\nDream\n51.5\n18.7\n187\n3250\nmale\n\n\n239\nChinstrap\nDream\n51.4\n19.0\n201\n3950\nmale\n\n\n240\nChinstrap\nDream\n50.7\n19.7\n203\n4050\nmale\n\n\n241\nChinstrap\nDream\n52.2\n18.8\n197\n3450\nmale\n\n\n242\nChinstrap\nDream\n49.3\n19.9\n203\n4050\nmale\n\n\n243\nChinstrap\nDream\n50.2\n18.8\n202\n3800\nmale\n\n\n244\nChinstrap\nDream\n51.9\n19.5\n206\n3950\nmale\n\n\n245\nChinstrap\nDream\n55.8\n19.8\n207\n4000\nmale\n\n\n246\nChinstrap\nDream\n49.6\n18.2\n193\n3775\nmale\n\n\n247\nChinstrap\nDream\n50.8\n19.0\n210\n4100\nmale\n\n\n\n\n\n\n\n\n\nsubset(penguins, [:species, :sex] => (x, y) -> (x .== \"Adelie\") .| (y .== \"male\"), skipmissing=true)\n\n247×7 DataFrame222 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n\n\n2\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n\n\n3\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n\n\n4\nAdelie\nTorgersen\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n5\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n\n\n6\nAdelie\nTorgersen\n39.3\n20.6\n190\n3650\nmale\n\n\n7\nAdelie\nTorgersen\n38.9\n17.8\n181\n3625\nfemale\n\n\n8\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n9\nAdelie\nTorgersen\n34.1\n18.1\n193\n3475\nmissing\n\n\n10\nAdelie\nTorgersen\n42.0\n20.2\n190\n4250\nmissing\n\n\n11\nAdelie\nTorgersen\n37.8\n17.1\n186\n3300\nmissing\n\n\n12\nAdelie\nTorgersen\n37.8\n17.3\n180\n3700\nmissing\n\n\n13\nAdelie\nTorgersen\n41.1\n17.6\n182\n3200\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n236\nChinstrap\nDream\n50.8\n18.5\n201\n4450\nmale\n\n\n237\nChinstrap\nDream\n49.0\n19.6\n212\n4300\nmale\n\n\n238\nChinstrap\nDream\n51.5\n18.7\n187\n3250\nmale\n\n\n239\nChinstrap\nDream\n51.4\n19.0\n201\n3950\nmale\n\n\n240\nChinstrap\nDream\n50.7\n19.7\n203\n4050\nmale\n\n\n241\nChinstrap\nDream\n52.2\n18.8\n197\n3450\nmale\n\n\n242\nChinstrap\nDream\n49.3\n19.9\n203\n4050\nmale\n\n\n243\nChinstrap\nDream\n50.2\n18.8\n202\n3800\nmale\n\n\n244\nChinstrap\nDream\n51.9\n19.5\n206\n3950\nmale\n\n\n245\nChinstrap\nDream\n55.8\n19.8\n207\n4000\nmale\n\n\n246\nChinstrap\nDream\n49.6\n18.2\n193\n3775\nmale\n\n\n247\nChinstrap\nDream\n50.8\n19.0\n210\n4100\nmale\n\n\n\n\n\n\n\n\n\nFiltering all the rows where the flipper_length_mm is greater than the mean.\n\nTidierDataFramesMetaDataFrames\n\n\n\n@filter penguins flipper_length_mm > mean(skipmissing(flipper_length_mm))\n\n148×7 DataFrame123 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nDream\n35.7\n18.0\n202\n3550\nfemale\n\n\n2\nAdelie\nDream\n41.1\n18.1\n205\n4300\nmale\n\n\n3\nAdelie\nDream\n40.8\n18.9\n208\n4300\nmale\n\n\n4\nAdelie\nBiscoe\n41.0\n20.0\n203\n4725\nmale\n\n\n5\nAdelie\nTorgersen\n41.4\n18.5\n202\n3875\nmale\n\n\n6\nAdelie\nTorgersen\n44.1\n18.0\n210\n4000\nmale\n\n\n7\nAdelie\nDream\n41.5\n18.5\n201\n4000\nmale\n\n\n8\nGentoo\nBiscoe\n46.1\n13.2\n211\n4500\nfemale\n\n\n9\nGentoo\nBiscoe\n50.0\n16.3\n230\n5700\nmale\n\n\n10\nGentoo\nBiscoe\n48.7\n14.1\n210\n4450\nfemale\n\n\n11\nGentoo\nBiscoe\n50.0\n15.2\n218\n5700\nmale\n\n\n12\nGentoo\nBiscoe\n47.6\n14.5\n215\n5400\nmale\n\n\n13\nGentoo\nBiscoe\n46.5\n13.5\n210\n4550\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n137\nChinstrap\nDream\n53.5\n19.9\n205\n4500\nmale\n\n\n138\nChinstrap\nDream\n49.0\n19.5\n210\n3950\nmale\n\n\n139\nChinstrap\nDream\n50.8\n18.5\n201\n4450\nmale\n\n\n140\nChinstrap\nDream\n49.0\n19.6\n212\n4300\nmale\n\n\n141\nChinstrap\nDream\n51.4\n19.0\n201\n3950\nmale\n\n\n142\nChinstrap\nDream\n50.7\n19.7\n203\n4050\nmale\n\n\n143\nChinstrap\nDream\n49.3\n19.9\n203\n4050\nmale\n\n\n144\nChinstrap\nDream\n50.2\n18.8\n202\n3800\nmale\n\n\n145\nChinstrap\nDream\n51.9\n19.5\n206\n3950\nmale\n\n\n146\nChinstrap\nDream\n55.8\n19.8\n207\n4000\nmale\n\n\n147\nChinstrap\nDream\n43.5\n18.1\n202\n3400\nfemale\n\n\n148\nChinstrap\nDream\n50.8\n19.0\n210\n4100\nmale\n\n\n\n\n\n\n\n\n\nDFM.@subset penguins :flipper_length_mm .>= mean(skipmissing(:flipper_length_mm))\n\n148×7 DataFrame123 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nDream\n35.7\n18.0\n202\n3550\nfemale\n\n\n2\nAdelie\nDream\n41.1\n18.1\n205\n4300\nmale\n\n\n3\nAdelie\nDream\n40.8\n18.9\n208\n4300\nmale\n\n\n4\nAdelie\nBiscoe\n41.0\n20.0\n203\n4725\nmale\n\n\n5\nAdelie\nTorgersen\n41.4\n18.5\n202\n3875\nmale\n\n\n6\nAdelie\nTorgersen\n44.1\n18.0\n210\n4000\nmale\n\n\n7\nAdelie\nDream\n41.5\n18.5\n201\n4000\nmale\n\n\n8\nGentoo\nBiscoe\n46.1\n13.2\n211\n4500\nfemale\n\n\n9\nGentoo\nBiscoe\n50.0\n16.3\n230\n5700\nmale\n\n\n10\nGentoo\nBiscoe\n48.7\n14.1\n210\n4450\nfemale\n\n\n11\nGentoo\nBiscoe\n50.0\n15.2\n218\n5700\nmale\n\n\n12\nGentoo\nBiscoe\n47.6\n14.5\n215\n5400\nmale\n\n\n13\nGentoo\nBiscoe\n46.5\n13.5\n210\n4550\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n137\nChinstrap\nDream\n53.5\n19.9\n205\n4500\nmale\n\n\n138\nChinstrap\nDream\n49.0\n19.5\n210\n3950\nmale\n\n\n139\nChinstrap\nDream\n50.8\n18.5\n201\n4450\nmale\n\n\n140\nChinstrap\nDream\n49.0\n19.6\n212\n4300\nmale\n\n\n141\nChinstrap\nDream\n51.4\n19.0\n201\n3950\nmale\n\n\n142\nChinstrap\nDream\n50.7\n19.7\n203\n4050\nmale\n\n\n143\nChinstrap\nDream\n49.3\n19.9\n203\n4050\nmale\n\n\n144\nChinstrap\nDream\n50.2\n18.8\n202\n3800\nmale\n\n\n145\nChinstrap\nDream\n51.9\n19.5\n206\n3950\nmale\n\n\n146\nChinstrap\nDream\n55.8\n19.8\n207\n4000\nmale\n\n\n147\nChinstrap\nDream\n43.5\n18.1\n202\n3400\nfemale\n\n\n148\nChinstrap\nDream\n50.8\n19.0\n210\n4100\nmale\n\n\n\n\n\n\n\n\n\nsubset(penguins, :flipper_length_mm => x -> x .> mean(skipmissing(x)), skipmissing=true)\n\n148×7 DataFrame123 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nDream\n35.7\n18.0\n202\n3550\nfemale\n\n\n2\nAdelie\nDream\n41.1\n18.1\n205\n4300\nmale\n\n\n3\nAdelie\nDream\n40.8\n18.9\n208\n4300\nmale\n\n\n4\nAdelie\nBiscoe\n41.0\n20.0\n203\n4725\nmale\n\n\n5\nAdelie\nTorgersen\n41.4\n18.5\n202\n3875\nmale\n\n\n6\nAdelie\nTorgersen\n44.1\n18.0\n210\n4000\nmale\n\n\n7\nAdelie\nDream\n41.5\n18.5\n201\n4000\nmale\n\n\n8\nGentoo\nBiscoe\n46.1\n13.2\n211\n4500\nfemale\n\n\n9\nGentoo\nBiscoe\n50.0\n16.3\n230\n5700\nmale\n\n\n10\nGentoo\nBiscoe\n48.7\n14.1\n210\n4450\nfemale\n\n\n11\nGentoo\nBiscoe\n50.0\n15.2\n218\n5700\nmale\n\n\n12\nGentoo\nBiscoe\n47.6\n14.5\n215\n5400\nmale\n\n\n13\nGentoo\nBiscoe\n46.5\n13.5\n210\n4550\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n137\nChinstrap\nDream\n53.5\n19.9\n205\n4500\nmale\n\n\n138\nChinstrap\nDream\n49.0\n19.5\n210\n3950\nmale\n\n\n139\nChinstrap\nDream\n50.8\n18.5\n201\n4450\nmale\n\n\n140\nChinstrap\nDream\n49.0\n19.6\n212\n4300\nmale\n\n\n141\nChinstrap\nDream\n51.4\n19.0\n201\n3950\nmale\n\n\n142\nChinstrap\nDream\n50.7\n19.7\n203\n4050\nmale\n\n\n143\nChinstrap\nDream\n49.3\n19.9\n203\n4050\nmale\n\n\n144\nChinstrap\nDream\n50.2\n18.8\n202\n3800\nmale\n\n\n145\nChinstrap\nDream\n51.9\n19.5\n206\n3950\nmale\n\n\n146\nChinstrap\nDream\n55.8\n19.8\n207\n4000\nmale\n\n\n147\nChinstrap\nDream\n43.5\n18.1\n202\n3400\nfemale\n\n\n148\nChinstrap\nDream\n50.8\n19.0\n210\n4100\nmale\n\n\n\n\n\n\n\n\n\n\n\n1.1.3 Filtering with a variable column name\nSuppose the column you want to filter is a variable, let’s say\n\nmy_column = :species\n\n:species\n\n\n\nTidierDataFramesMetaDataFrames\n\n\n\n# how to do it??\n# @filter(penguins, !!(my_column) .== \"Adelie\")\n\n\n\n\nDFM.@rsubset penguins $my_column == \"Adelie\"\n\n152×7 DataFrame127 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n\n\n2\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n\n\n3\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n\n\n4\nAdelie\nTorgersen\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n5\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n\n\n6\nAdelie\nTorgersen\n39.3\n20.6\n190\n3650\nmale\n\n\n7\nAdelie\nTorgersen\n38.9\n17.8\n181\n3625\nfemale\n\n\n8\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n9\nAdelie\nTorgersen\n34.1\n18.1\n193\n3475\nmissing\n\n\n10\nAdelie\nTorgersen\n42.0\n20.2\n190\n4250\nmissing\n\n\n11\nAdelie\nTorgersen\n37.8\n17.1\n186\n3300\nmissing\n\n\n12\nAdelie\nTorgersen\n37.8\n17.3\n180\n3700\nmissing\n\n\n13\nAdelie\nTorgersen\n41.1\n17.6\n182\n3200\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n141\nAdelie\nDream\n40.2\n17.1\n193\n3400\nfemale\n\n\n142\nAdelie\nDream\n40.6\n17.2\n187\n3475\nmale\n\n\n143\nAdelie\nDream\n32.1\n15.5\n188\n3050\nfemale\n\n\n144\nAdelie\nDream\n40.7\n17.0\n190\n3725\nmale\n\n\n145\nAdelie\nDream\n37.3\n16.8\n192\n3000\nfemale\n\n\n146\nAdelie\nDream\n39.0\n18.7\n185\n3650\nmale\n\n\n147\nAdelie\nDream\n39.2\n18.6\n190\n4250\nmale\n\n\n148\nAdelie\nDream\n36.6\n18.4\n184\n3475\nfemale\n\n\n149\nAdelie\nDream\n36.0\n17.8\n195\n3450\nfemale\n\n\n150\nAdelie\nDream\n37.8\n18.1\n193\n3750\nmale\n\n\n151\nAdelie\nDream\n36.0\n17.1\n187\n3700\nfemale\n\n\n152\nAdelie\nDream\n41.5\n18.5\n201\n4000\nmale\n\n\n\n\n\n\n\n\n\nsubset(penguins, my_column => x -> x .== \"Adelie\")\n\n152×7 DataFrame127 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n\n\n2\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n\n\n3\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n\n\n4\nAdelie\nTorgersen\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n5\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n\n\n6\nAdelie\nTorgersen\n39.3\n20.6\n190\n3650\nmale\n\n\n7\nAdelie\nTorgersen\n38.9\n17.8\n181\n3625\nfemale\n\n\n8\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n9\nAdelie\nTorgersen\n34.1\n18.1\n193\n3475\nmissing\n\n\n10\nAdelie\nTorgersen\n42.0\n20.2\n190\n4250\nmissing\n\n\n11\nAdelie\nTorgersen\n37.8\n17.1\n186\n3300\nmissing\n\n\n12\nAdelie\nTorgersen\n37.8\n17.3\n180\n3700\nmissing\n\n\n13\nAdelie\nTorgersen\n41.1\n17.6\n182\n3200\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n141\nAdelie\nDream\n40.2\n17.1\n193\n3400\nfemale\n\n\n142\nAdelie\nDream\n40.6\n17.2\n187\n3475\nmale\n\n\n143\nAdelie\nDream\n32.1\n15.5\n188\n3050\nfemale\n\n\n144\nAdelie\nDream\n40.7\n17.0\n190\n3725\nmale\n\n\n145\nAdelie\nDream\n37.3\n16.8\n192\n3000\nfemale\n\n\n146\nAdelie\nDream\n39.0\n18.7\n185\n3650\nmale\n\n\n147\nAdelie\nDream\n39.2\n18.6\n190\n4250\nmale\n\n\n148\nAdelie\nDream\n36.6\n18.4\n184\n3475\nfemale\n\n\n149\nAdelie\nDream\n36.0\n17.8\n195\n3450\nfemale\n\n\n150\nAdelie\nDream\n37.8\n18.1\n193\n3750\nmale\n\n\n151\nAdelie\nDream\n36.0\n17.1\n187\n3700\nfemale\n\n\n152\nAdelie\nDream\n41.5\n18.5\n201\n4000\nmale\n\n\n\n\n\n\n\n\n\nIn case the column is a string\n\nmy_column2 = \"species\"\n\n\"species\"\n\n\ninstead of a symbol, we can write\n\nDataFramesMetaDataFrames\n\n\n\nDFM.@rsubset penguins $(Symbol(my_column2)) == \"Adelie\"\n\n152×7 DataFrame127 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n\n\n2\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n\n\n3\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n\n\n4\nAdelie\nTorgersen\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n5\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n\n\n6\nAdelie\nTorgersen\n39.3\n20.6\n190\n3650\nmale\n\n\n7\nAdelie\nTorgersen\n38.9\n17.8\n181\n3625\nfemale\n\n\n8\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n9\nAdelie\nTorgersen\n34.1\n18.1\n193\n3475\nmissing\n\n\n10\nAdelie\nTorgersen\n42.0\n20.2\n190\n4250\nmissing\n\n\n11\nAdelie\nTorgersen\n37.8\n17.1\n186\n3300\nmissing\n\n\n12\nAdelie\nTorgersen\n37.8\n17.3\n180\n3700\nmissing\n\n\n13\nAdelie\nTorgersen\n41.1\n17.6\n182\n3200\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n141\nAdelie\nDream\n40.2\n17.1\n193\n3400\nfemale\n\n\n142\nAdelie\nDream\n40.6\n17.2\n187\n3475\nmale\n\n\n143\nAdelie\nDream\n32.1\n15.5\n188\n3050\nfemale\n\n\n144\nAdelie\nDream\n40.7\n17.0\n190\n3725\nmale\n\n\n145\nAdelie\nDream\n37.3\n16.8\n192\n3000\nfemale\n\n\n146\nAdelie\nDream\n39.0\n18.7\n185\n3650\nmale\n\n\n147\nAdelie\nDream\n39.2\n18.6\n190\n4250\nmale\n\n\n148\nAdelie\nDream\n36.6\n18.4\n184\n3475\nfemale\n\n\n149\nAdelie\nDream\n36.0\n17.8\n195\n3450\nfemale\n\n\n150\nAdelie\nDream\n37.8\n18.1\n193\n3750\nmale\n\n\n151\nAdelie\nDream\n36.0\n17.1\n187\n3700\nfemale\n\n\n152\nAdelie\nDream\n41.5\n18.5\n201\n4000\nmale\n\n\n\n\n\n\n\n\n\nsubset(penguins, my_column2 => x -> x .== \"Adelie\")\n\n152×7 DataFrame127 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n\n\n2\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n\n\n3\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n\n\n4\nAdelie\nTorgersen\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n5\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n\n\n6\nAdelie\nTorgersen\n39.3\n20.6\n190\n3650\nmale\n\n\n7\nAdelie\nTorgersen\n38.9\n17.8\n181\n3625\nfemale\n\n\n8\nAdelie\nTorgersen\n39.2\n19.6\n195\n4675\nmale\n\n\n9\nAdelie\nTorgersen\n34.1\n18.1\n193\n3475\nmissing\n\n\n10\nAdelie\nTorgersen\n42.0\n20.2\n190\n4250\nmissing\n\n\n11\nAdelie\nTorgersen\n37.8\n17.1\n186\n3300\nmissing\n\n\n12\nAdelie\nTorgersen\n37.8\n17.3\n180\n3700\nmissing\n\n\n13\nAdelie\nTorgersen\n41.1\n17.6\n182\n3200\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n141\nAdelie\nDream\n40.2\n17.1\n193\n3400\nfemale\n\n\n142\nAdelie\nDream\n40.6\n17.2\n187\n3475\nmale\n\n\n143\nAdelie\nDream\n32.1\n15.5\n188\n3050\nfemale\n\n\n144\nAdelie\nDream\n40.7\n17.0\n190\n3725\nmale\n\n\n145\nAdelie\nDream\n37.3\n16.8\n192\n3000\nfemale\n\n\n146\nAdelie\nDream\n39.0\n18.7\n185\n3650\nmale\n\n\n147\nAdelie\nDream\n39.2\n18.6\n190\n4250\nmale\n\n\n148\nAdelie\nDream\n36.6\n18.4\n184\n3475\nfemale\n\n\n149\nAdelie\nDream\n36.0\n17.8\n195\n3450\nfemale\n\n\n150\nAdelie\nDream\n37.8\n18.1\n193\n3750\nmale\n\n\n151\nAdelie\nDream\n36.0\n17.1\n187\n3700\nfemale\n\n\n152\nAdelie\nDream\n41.5\n18.5\n201\n4000\nmale",
+ "crumbs": [
+ "Part 2: Dataframes",
+ "1Operations on rows"
+ ]
+ },
+ {
+ "objectID": "dataframes-rows.html#arranging",
+ "href": "dataframes-rows.html#arranging",
+ "title": "1 Operations on rows",
+ "section": "1.2 Arranging",
+ "text": "1.2 Arranging\nArranging is when we reorder the rows of a dataframe according to some criteria.\n\n@arrange penguins body_mass_g\n\n344×7 DataFrame319 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nChinstrap\nDream\n46.9\n16.6\n192\n2700\nfemale\n\n\n2\nAdelie\nBiscoe\n36.5\n16.6\n181\n2850\nfemale\n\n\n3\nAdelie\nBiscoe\n36.4\n17.1\n184\n2850\nfemale\n\n\n4\nAdelie\nBiscoe\n34.5\n18.1\n187\n2900\nfemale\n\n\n5\nAdelie\nDream\n33.1\n16.1\n178\n2900\nfemale\n\n\n6\nAdelie\nTorgersen\n38.6\n17.0\n188\n2900\nfemale\n\n\n7\nChinstrap\nDream\n43.2\n16.6\n187\n2900\nfemale\n\n\n8\nAdelie\nBiscoe\n37.9\n18.6\n193\n2925\nfemale\n\n\n9\nAdelie\nDream\n37.5\n18.9\n179\n2975\nmissing\n\n\n10\nAdelie\nDream\n37.0\n16.9\n185\n3000\nfemale\n\n\n11\nAdelie\nDream\n37.3\n16.8\n192\n3000\nfemale\n\n\n12\nAdelie\nTorgersen\n35.9\n16.6\n190\n3050\nfemale\n\n\n13\nAdelie\nTorgersen\n35.2\n15.9\n186\n3050\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n333\nGentoo\nBiscoe\n48.6\n16.0\n230\n5800\nmale\n\n\n334\nGentoo\nBiscoe\n48.4\n14.6\n213\n5850\nmale\n\n\n335\nGentoo\nBiscoe\n49.3\n15.7\n217\n5850\nmale\n\n\n336\nGentoo\nBiscoe\n55.1\n16.0\n230\n5850\nmale\n\n\n337\nGentoo\nBiscoe\n45.2\n16.4\n223\n5950\nmale\n\n\n338\nGentoo\nBiscoe\n49.8\n15.9\n229\n5950\nmale\n\n\n339\nGentoo\nBiscoe\n51.1\n16.3\n220\n6000\nmale\n\n\n340\nGentoo\nBiscoe\n48.8\n16.2\n222\n6000\nmale\n\n\n341\nGentoo\nBiscoe\n59.6\n17.0\n230\n6050\nmale\n\n\n342\nGentoo\nBiscoe\n49.2\n15.2\n221\n6300\nmale\n\n\n343\nAdelie\nTorgersen\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n344\nGentoo\nBiscoe\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n\n\n\n\n\n@arrange penguins species body_mass_g\n\n344×7 DataFrame319 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nAdelie\nBiscoe\n36.5\n16.6\n181\n2850\nfemale\n\n\n2\nAdelie\nBiscoe\n36.4\n17.1\n184\n2850\nfemale\n\n\n3\nAdelie\nBiscoe\n34.5\n18.1\n187\n2900\nfemale\n\n\n4\nAdelie\nDream\n33.1\n16.1\n178\n2900\nfemale\n\n\n5\nAdelie\nTorgersen\n38.6\n17.0\n188\n2900\nfemale\n\n\n6\nAdelie\nBiscoe\n37.9\n18.6\n193\n2925\nfemale\n\n\n7\nAdelie\nDream\n37.5\n18.9\n179\n2975\nmissing\n\n\n8\nAdelie\nDream\n37.0\n16.9\n185\n3000\nfemale\n\n\n9\nAdelie\nDream\n37.3\n16.8\n192\n3000\nfemale\n\n\n10\nAdelie\nTorgersen\n35.9\n16.6\n190\n3050\nfemale\n\n\n11\nAdelie\nTorgersen\n35.2\n15.9\n186\n3050\nfemale\n\n\n12\nAdelie\nTorgersen\n39.0\n17.1\n191\n3050\nfemale\n\n\n13\nAdelie\nDream\n32.1\n15.5\n188\n3050\nfemale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n333\nGentoo\nBiscoe\n49.5\n16.2\n229\n5800\nmale\n\n\n334\nGentoo\nBiscoe\n48.6\n16.0\n230\n5800\nmale\n\n\n335\nGentoo\nBiscoe\n48.4\n14.6\n213\n5850\nmale\n\n\n336\nGentoo\nBiscoe\n49.3\n15.7\n217\n5850\nmale\n\n\n337\nGentoo\nBiscoe\n55.1\n16.0\n230\n5850\nmale\n\n\n338\nGentoo\nBiscoe\n45.2\n16.4\n223\n5950\nmale\n\n\n339\nGentoo\nBiscoe\n49.8\n15.9\n229\n5950\nmale\n\n\n340\nGentoo\nBiscoe\n51.1\n16.3\n220\n6000\nmale\n\n\n341\nGentoo\nBiscoe\n48.8\n16.2\n222\n6000\nmale\n\n\n342\nGentoo\nBiscoe\n59.6\n17.0\n230\n6050\nmale\n\n\n343\nGentoo\nBiscoe\n49.2\n15.2\n221\n6300\nmale\n\n\n344\nGentoo\nBiscoe\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n\n\n\n\n\n@arrange penguins island desc(body_mass_g)\n\n344×7 DataFrame319 rows omitted\n\n\n\nRow\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\n\n\n\nString15\nString15\nFloat64?\nFloat64?\nInt64?\nInt64?\nString7\n\n\n\n\n1\nGentoo\nBiscoe\nmissing\nmissing\nmissing\nmissing\nmissing\n\n\n2\nGentoo\nBiscoe\n49.2\n15.2\n221\n6300\nmale\n\n\n3\nGentoo\nBiscoe\n59.6\n17.0\n230\n6050\nmale\n\n\n4\nGentoo\nBiscoe\n51.1\n16.3\n220\n6000\nmale\n\n\n5\nGentoo\nBiscoe\n48.8\n16.2\n222\n6000\nmale\n\n\n6\nGentoo\nBiscoe\n45.2\n16.4\n223\n5950\nmale\n\n\n7\nGentoo\nBiscoe\n49.8\n15.9\n229\n5950\nmale\n\n\n8\nGentoo\nBiscoe\n48.4\n14.6\n213\n5850\nmale\n\n\n9\nGentoo\nBiscoe\n49.3\n15.7\n217\n5850\nmale\n\n\n10\nGentoo\nBiscoe\n55.1\n16.0\n230\n5850\nmale\n\n\n11\nGentoo\nBiscoe\n49.5\n16.2\n229\n5800\nmale\n\n\n12\nGentoo\nBiscoe\n48.6\n16.0\n230\n5800\nmale\n\n\n13\nGentoo\nBiscoe\n50.4\n15.7\n222\n5750\nmale\n\n\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n⋮\n\n\n333\nAdelie\nTorgersen\n41.1\n18.6\n189\n3325\nmale\n\n\n334\nAdelie\nTorgersen\n38.5\n17.9\n190\n3325\nfemale\n\n\n335\nAdelie\nTorgersen\n37.8\n17.1\n186\n3300\nmissing\n\n\n336\nAdelie\nTorgersen\n38.8\n17.6\n191\n3275\nfemale\n\n\n337\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n\n\n338\nAdelie\nTorgersen\n41.1\n17.6\n182\n3200\nfemale\n\n\n339\nAdelie\nTorgersen\n34.6\n17.2\n189\n3200\nfemale\n\n\n340\nAdelie\nTorgersen\n36.2\n17.2\n187\n3150\nfemale\n\n\n341\nAdelie\nTorgersen\n35.9\n16.6\n190\n3050\nfemale\n\n\n342\nAdelie\nTorgersen\n35.2\n15.9\n186\n3050\nfemale\n\n\n343\nAdelie\nTorgersen\n39.0\n17.1\n191\n3050\nfemale\n\n\n344\nAdelie\nTorgersen\n38.6\n17.0\n188\n2900\nfemale",
+ "crumbs": [
+ "Part 2: Dataframes",
+ "1Operations on rows"
+ ]
}
]
\ No newline at end of file