-
Notifications
You must be signed in to change notification settings - Fork 5
/
ex_01.qmd
79 lines (63 loc) · 1.54 KB
/
ex_01.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
title: "Exercise 1. Your turn"
---
## Load library packages
```{r}
library(janeaustenr)
library(tidyverse)
library(tidytext)
library(wordcloud2)
```
## Your Turn. Exercise 1.
Goal: Make a basic word cloud for the novel, *Pride and Predjudice*, `pride_prej_novel`
a. Prepare: Add a line number to the text
```{r}
#| eval: false
pride_prej_novel <- tibble(text = prideprejudice) %>%
mutate(line = ________________)
```
b. Tokenize `pride_prej_novel` with `unnest_tokens()`
```{r}
#| eval: false
pride_prej_novel %>%
unnest_tokens(____, _____)
```
c. Remove stop-words
```{r}
#| eval: false
pride_prej_novel %>%
unnest_tokens(____, _____) %>%
anti_join(____________)
```
d. calculate word frequency
```{r}
#| eval: false
pride_prej_novel %>%
unnest_tokens(____, _____) %>%
anti_join(____________) %>%
count(____________)
```
e. make a simple wordcloud
```{r}
#| eval: false
pride_prej_novel %>%
unnest_tokens(____, _____) %>%
anti_join(____________) %>%
count(____________) %>%
with(wordcloud::wordcloud(____, ____, max.words = ___))
```
f. Since "Friends don't let friends make word clouds", make a barplot of the word frequency.
```{r}
#| eval: false
pride_prej_novel %>%
unnest_tokens(word, text) %>%
anti_join(get_stopwords(), by = "word") %>%
count(word, sort = TRUE) %>%
slice_head(n = 10) %>%
ggplot(aes(x = n, y = fct_reorder(word, n))) +
geom_col() +
labs(title = "Word Frequency",
subtitle = "Jane Austen novel",
x = "", y = "",
caption = "Source: janeaustenr")
```