This repository has been archived by the owner on Apr 2, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy path09-vectorisation.html
252 lines (252 loc) · 13.9 KB
/
09-vectorisation.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: R for reproducible scientific analysis</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<a href="index.html"><h1 class="title">R for reproducible scientific analysis</h1></a>
<h2 class="subtitle">Vectorisation</h2>
<section class="objectives panel panel-warning">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-certificate"></span>Learning objectives</h2>
</div>
<div class="panel-body">
<ul>
<li>To understand vectorised operations in R.</li>
</ul>
</div>
</section>
<p>One of the nice features of R is that most of its functions are vectorised, that is the function will operate on all elements of a vector without needing to loop through and act on each element one at a time. This makes writing code more concise, easy to read, and less error prone.</p>
<pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="dv">1</span>:<span class="dv">4</span>
x *<span class="st"> </span><span class="dv">2</span></code></pre>
<pre class="output"><code>[1] 2 4 6 8
</code></pre>
<p>The multiplication happened to each element of the vector.</p>
<p>We can also add two vectors together:</p>
<pre class="sourceCode r"><code class="sourceCode r">y <-<span class="st"> </span><span class="dv">6</span>:<span class="dv">9</span>
x +<span class="st"> </span>y</code></pre>
<pre class="output"><code>[1] 7 9 11 13
</code></pre>
<p>Each element of <code>x</code> was added to its corresponding element of <code>y</code>:</p>
<pre class="sourceCode r"><code class="sourceCode r">x:<span class="st"> </span><span class="dv">1</span> <span class="dv">2</span> <span class="dv">3</span> <span class="dv">4</span>
+<span class="st"> </span>+<span class="st"> </span>+<span class="st"> </span>+
y:<span class="st"> </span><span class="dv">6</span> <span class="dv">7</span> <span class="dv">8</span> <span class="dv">9</span>
---------------
<span class="st"> </span><span class="dv">7</span> <span class="dv">9</span> <span class="dv">11</span> <span class="dv">13</span></code></pre>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4><span class="glyphicon glyphicon-pencil"></span>Challenge 1</h4>
</div>
<div class="panel-body">
<p>Let’s try this on the <code>pop</code> column of the <code>gapminder</code> dataset.</p>
<p>Make a new column in the <code>gapminder</code> data frame that contains population in units of millions of people. Check the head or tail of the data frame to make sure it worked.</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4><span class="glyphicon glyphicon-pencil"></span>Challenge 2</h4>
</div>
<div class="panel-body">
<p>Refresh your ggplot skills by plotting population in millions against year.</p>
</div>
</section>
<p>Comparison operators also apply element-wise, as we saw in the subsetting lesson:</p>
<pre class="sourceCode r"><code class="sourceCode r">x ><span class="st"> </span><span class="dv">2</span></code></pre>
<pre class="output"><code>[1] FALSE FALSE TRUE TRUE
</code></pre>
<p>Logical operations are also vectorised:</p>
<pre class="sourceCode r"><code class="sourceCode r">a <-<span class="st"> </span>x ><span class="st"> </span><span class="dv">3</span>
a</code></pre>
<pre class="output"><code>[1] FALSE FALSE FALSE TRUE
</code></pre>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h4><span class="glyphicon glyphicon-pushpin"></span>Tip: some useful functions for logical vectors</h4>
</div>
<div class="panel-body">
<p><code>any()</code> will return <code>TRUE</code> if any element of a vector is <code>TRUE</code> <code>all()</code> will return <code>TRUE</code> if <em>all</em> elements of a vector are <code>TRUE</code></p>
</div>
</aside>
<p>Many functions also operate on element-wise on vectors:</p>
<pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="dv">1</span>:<span class="dv">4</span>
<span class="kw">log</span>(x)</code></pre>
<pre class="output"><code>[1] 0.0000000 0.6931472 1.0986123 1.3862944
</code></pre>
<p>Vectorised operations also work element wise on matrices:</p>
<pre class="sourceCode r"><code class="sourceCode r">m <-<span class="st"> </span><span class="kw">matrix</span>(<span class="dv">1</span>:<span class="dv">12</span>, <span class="dt">nrow=</span><span class="dv">3</span>, <span class="dt">ncol=</span><span class="dv">4</span>)
m *<span class="st"> </span>-<span class="dv">1</span></code></pre>
<pre class="output"><code> [,1] [,2] [,3] [,4]
[1,] -1 -4 -7 -10
[2,] -2 -5 -8 -11
[3,] -3 -6 -9 -12
</code></pre>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h4><span class="glyphicon glyphicon-pushpin"></span>Tip: element-wise vs. matrix multiplication</h4>
</div>
<div class="panel-body">
<p>Note that <code>*</code> gives you element-wise multiplication! To do matrix multiplication, we need to use the <code>%*%</code> operator:</p>
<p>For more on matrix algebra, see the <a href="http://www.statmethods.net/advstats/matrix.html">Quick-R reference guide</a></p>
</div>
</aside>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4><span class="glyphicon glyphicon-pencil"></span>Challenge 3</h4>
</div>
<div class="panel-body">
<p>Given the following matrix:</p>
<pre class="sourceCode r"><code class="sourceCode r">m <-<span class="st"> </span><span class="kw">matrix</span>(<span class="dv">1</span>:<span class="dv">12</span>, <span class="dt">nrow=</span><span class="dv">3</span>, <span class="dt">ncol=</span><span class="dv">4</span>)
m</code></pre>
<pre class="output"><code> [,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
</code></pre>
<p>Write down what you think will happen when you run:</p>
<ol style="list-style-type: decimal">
<li><code>m ^ -1</code></li>
<li><code>m * c(1, 0, -1)</code></li>
<li><code>m > c(0, 20)</code></li>
</ol>
<p>Did you get the output expected? If not, ask a helper!</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4><span class="glyphicon glyphicon-pencil"></span>Bonus Challenge</h4>
</div>
<div class="panel-body">
<p>We’re interested in looking at the sum of the following sequence of fractions:</p>
<pre class="sourceCode r"><code class="sourceCode r"> x =<span class="st"> </span><span class="dv">1</span>/(<span class="dv">1</span>^<span class="dv">2</span>) +<span class="st"> </span><span class="dv">1</span>/(<span class="dv">2</span>^<span class="dv">2</span>) +<span class="st"> </span><span class="dv">1</span>/(<span class="dv">3</span>^<span class="dv">2</span>) +<span class="st"> </span>... +<span class="st"> </span><span class="dv">1</span>/(n^<span class="dv">2</span>)</code></pre>
<p>This would be tedious to type out, and impossible for high values of n. Can you use vectorisation to solve for x, when n=100? How about when n=10,000?</p>
</div>
</section>
<h2 id="challenge-solutions">Challenge solutions</h2>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4><span class="glyphicon glyphicon-pencil"></span>Solution to challenge 1</h4>
</div>
<div class="panel-body">
<p>Let’s try this on the <code>pop</code> column of the <code>gapminder</code> dataset.</p>
<p>Make a new column in the <code>gapminder</code> data frame that contains population in units of millions of people. Check the head or tail of the data frame to make sure it worked.</p>
<pre class="sourceCode r"><code class="sourceCode r">gapminder$pop_millions <-<span class="st"> </span>gapminder$pop /<span class="st"> </span><span class="fl">1e6</span>
<span class="kw">head</span>(gapminder)</code></pre>
<pre class="output"><code> country year pop continent lifeExp gdpPercap pop_millions
1 Afghanistan 1952 8425333 Asia 28.801 779.4453 8.425333
2 Afghanistan 1957 9240934 Asia 30.332 820.8530 9.240934
3 Afghanistan 1962 10267083 Asia 31.997 853.1007 10.267083
4 Afghanistan 1967 11537966 Asia 34.020 836.1971 11.537966
5 Afghanistan 1972 13079460 Asia 36.088 739.9811 13.079460
6 Afghanistan 1977 14880372 Asia 38.438 786.1134 14.880372
</code></pre>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4><span class="glyphicon glyphicon-pencil"></span>Solution to challenge 2</h4>
</div>
<div class="panel-body">
<p>Refresh your ggplot skills by plotting population in millions against year.</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(gapminder, <span class="kw">aes</span>(<span class="dt">x =</span> year, <span class="dt">y =</span> pop_millions)) +<span class="st"> </span><span class="kw">geom_point</span>()</code></pre>
<p><img src="fig/09-vectorisation-ch2-sol-1.png" title="plot of chunk ch2-sol" alt="plot of chunk ch2-sol" style="display: block; margin: auto;" /></p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4><span class="glyphicon glyphicon-pencil"></span>Solution to challenge 3</h4>
</div>
<div class="panel-body">
<p>Given the following matrix:</p>
<pre class="sourceCode r"><code class="sourceCode r">m <-<span class="st"> </span><span class="kw">matrix</span>(<span class="dv">1</span>:<span class="dv">12</span>, <span class="dt">nrow=</span><span class="dv">3</span>, <span class="dt">ncol=</span><span class="dv">4</span>)
m</code></pre>
<pre class="output"><code> [,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
</code></pre>
<p>Write down what you think will happen when you run:</p>
<ol style="list-style-type: decimal">
<li><code>m ^ -1</code></li>
</ol>
<pre class="output"><code> [,1] [,2] [,3] [,4]
[1,] 1.0000000 0.2500000 0.1428571 0.10000000
[2,] 0.5000000 0.2000000 0.1250000 0.09090909
[3,] 0.3333333 0.1666667 0.1111111 0.08333333
</code></pre>
<ol start="2" style="list-style-type: decimal">
<li><code>m * c(1, 0, -1)</code></li>
</ol>
<pre class="output"><code> [,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 0 0 0 0
[3,] -3 -6 -9 -12
</code></pre>
<ol start="3" style="list-style-type: decimal">
<li><code>m > c(0, 20)</code></li>
</ol>
<pre class="output"><code> [,1] [,2] [,3] [,4]
[1,] TRUE FALSE TRUE FALSE
[2,] FALSE TRUE FALSE TRUE
[3,] TRUE FALSE TRUE FALSE
</code></pre>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4><span class="glyphicon glyphicon-pencil"></span>Bonus Challenge</h4>
</div>
<div class="panel-body">
<p>We’re interested in looking at the sum of the following sequence of fractions:</p>
<pre class="sourceCode r"><code class="sourceCode r"> x =<span class="st"> </span><span class="dv">1</span>/(<span class="dv">1</span>^<span class="dv">2</span>) +<span class="st"> </span><span class="dv">1</span>/(<span class="dv">2</span>^<span class="dv">2</span>) +<span class="st"> </span><span class="dv">1</span>/(<span class="dv">3</span>^<span class="dv">2</span>) +<span class="st"> </span>... +<span class="st"> </span><span class="dv">1</span>/(n^<span class="dv">2</span>)</code></pre>
<p>This would be tedious to type out, and impossible for high values of n. Can you use vectorisation to solve for x, when n=100? How about when n=10,000?</p>
<pre class="sourceCode r"><code class="sourceCode r">inverse_sum_of_squares <-<span class="st"> </span>function(n) {
sequence <-<span class="st"> </span><span class="dv">1</span>:n
y <-<span class="st"> </span><span class="dv">1</span>/(sequence^<span class="dv">2</span>)
result <-<span class="st"> </span><span class="kw">sum</span>(y)
<span class="kw">return</span>(result)
}
<span class="kw">inverse_sum_of_squares</span>(<span class="dv">100</span>)</code></pre>
<pre class="output"><code>[1] 1.634984
</code></pre>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">inverse_sum_of_squares</span>(<span class="dv">10000</span>)</code></pre>
<pre class="output"><code>[1] 1.644834
</code></pre>
</div>
</section>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/lesson-template">Source</a>
<a class="label swc-blue-bg" href="mailto:[email protected]">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
</body>
</html>