-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpresentation-ab-testing-and-multi-armed-bandit.html
236 lines (225 loc) · 12.5 KB
/
presentation-ab-testing-and-multi-armed-bandit.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Presentation of the two methods</title>
<link rel="stylesheet" href="/theme/css/main.css" />
<!--[if IE]>
<script src="https://html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body id="index" class="home">
<header id="banner" class="body">
<h1><a href="/">And yet it moves! </a></h1>
<nav><ul>
<li class="active"><a href="/category/data-processing.html">Data processing</a></li>
</ul></nav>
</header><!-- /#banner -->
<section id="content" class="body">
<article>
<header>
<h1 class="entry-title">
<a href="/presentation-ab-testing-and-multi-armed-bandit.html" rel="bookmark"
title="Permalink to Presentation of the two methods">Presentation of the two methods</a></h1>
</header>
<div class="entry-content">
<footer class="post-info">
<abbr class="published" title="2016-04-28T17:24:00+01:00">
Published: jeu. 28 avril 2016
</abbr>
<address class="vcard author">
By <a class="url fn" href="/author/gael.html">Gaël</a>
</address>
<p>In <a href="/category/data-processing.html">Data processing</a>.</p>
</footer><!-- /.post-info --> <p>This post is the second technical post of the <a href="{filename}01-index.md">series</a>.</p>
<p><em><a href="{filename}03-basic_probabilities.md">We just saw</a>
that statistics could be used to determine the behavior of an
entire population (e.g. the website visitors) from a sample of it
(a few visitors). This property is what makes statistics particularly
suitable to website optimization.</em></p>
<p><em>However, these fundamental principles do not dictate an unique way of actually
carrying out a testing campaign: there are a lot of different ways to gather
and analyse the data. We are going to present in this post the two frameworks
that we are going to compare in the remaining posts: the A/B testing method and
one of the multi-armed bandit strategies.</em></p>
<h1 id="ab-testing">A/B testing</h1>
<h2 id="purpose">Purpose</h2>
<p>The only purpose of the <a href="https://en.wikipedia.org/wiki/A/B_testing">A/B testing</a>
method is to <em>efficiently</em> answer a single question:</p>
<blockquote>
<p>Are <span class="math">\(P(O|A)\)</span> and <span class="math">\(P(O|B)\)</span> significantly different?</p>
</blockquote>
<p>which, in plain words, can be translated to:</p>
<blockquote>
<p>Does the website variant with the orange <em>Buy Now!</em> button
<strong>actually</strong> yield a different click-through rate than the variant
with the green <em>Buy Now!</em> button?</p>
</blockquote>
<p>It is one of the most prevalent methods to do website optimization, mostly
because of the guarantees it provides (see below).</p>
<h2 id="data-gathering">Data gathering</h2>
<p>The data are gathered ensuring that <span class="math">\(\tilde{P}(A) ≈ \tilde{P}(B)\)</span> at any given time
(i.e. the variant with the orange button is served as often as the
variant with the green button, in parallel).
This is necessary to avoid introducing a bias
and this is actually the most efficient data gathering scheme to answer
the question above.</p>
<h2 id="data-analysis">Data analysis</h2>
<p>Whether or not the difference between <span class="math">\(P(O|A)\)</span> and <span class="math">\(P(O|B)\)</span> is (actually)
significant is assessed by a <strong>statistical test</strong> (in this case,
<a href="https://en.wikipedia.org/wiki/Fisher%27s_exact_test">Fisher's exact test</a>).
If the differences are <strong>proven</strong> significant, they can be calculated with
a known
<a href="https://en.wikipedia.org/wiki/Accuracy_and_precision">precision</a>, <a href="https://en.wikipedia.org/wiki/Statistical_significance">degree of
significance</a> and
<a href="https://en.wikipedia.org/wiki/Statistical_power">statistical power</a>.</p>
<p>Roughly speaking, these constitute the level of confidence we have in the
results and the testing campaign can easily be set up so that a given
confidence is reached. This is what makes the A/B testing method so appealing
for web optimization.</p>
<h1 id="multi-armed-bandit_1">Multi-armed bandit</h1>
<h2 id="purpose_1">Purpose</h2>
<p>The main purpose of the <a href="https://en.wikipedia.org/wiki/Multi-armed_bandit">multi-armed bandit</a>
strategy is to <em>maximize</em> the reward (the click-through rate in this case),
even in the data-gathering stage.</p>
<h2 id="data-gathering_1">Data gathering</h2>
<p>The particular strategy given in the
<a href="http://stevehanov.ca/blog/index.php?id=132">blog post</a>
consists in determining which event (i.e. variant of the website)
exhibits the best click-through rate; that (favored)
event is then set to occur more often. As soon as another event is found to give
a better click-through rate, it gets favored and that variant is
thereafter presented more often, <em>etc</em>.</p>
<p>The <a href="http://stevehanov.ca/blog/index.php?id=132">other blog post</a> does give
another description separating an <em>exploration</em> stage from and <em>exploitation</em>
one. It is strictly equivalent to the description above. In particular, we
can easily relate <span class="math">\(\varepsilon\)</span> from the blog post to the probability of
serving one given event:
</p>
<div class="math">$$P(\text{disfavored}) = \frac{1-\varepsilon}{\text{number of events}}$$</div>
<p>
<center>and</center>
</p>
<div class="math">$$P(\text{favored}) = \varepsilon + P(\text{disfavored}).$$</div>
<h2 id="data-analysis_1">Data analysis</h2>
<p>There is no data analysis stage in this strategy: whichever variant exhibits
the highest click-through rate at the end of the campaign is selected.
This will not prevent us from performing a contingency test, but at this
stage, we have no idea of what the results would be. This also means that
in the current state of our knowledge, the multi-armed bandit strategy may
not provide the same "confidence" guarantees as the A/B testing method.
This is something we will have to estimate.</p>
<h1 id="summary-of-the-differences_1">Summary of the differences</h1>
<p>There are two main differences between the methods:</p>
<ol>
<li><em>Data-gathering stage:</em> in A/B testing, <span class="math">\(P(A) = P(B)\)</span> (all site variants
are served with an equal probability) whereas in the multi-armed bandit
strategy <span class="math">\(P(X) > P(Y)\)</span> if <span class="math">\(P(O|X) > P(O|Y)\)</span> (the variant giving the
highest click-through rate is served more often).</li>
<li><em>Data-analysis stage:</em> there is an actual statistical test in A/B testing
(hence "testing") while there is none in the multi-armed bandit strategy.</li>
</ol>
<p>However, as we said earlier, we can perform a contingency test in the
multi-armed bandit framework (given a few assumptions). That is why what we
are really going to compare, except in one case,
is the result of the data-gathering stage.
We are still going to refer to A/B <em>testing</em> though: this is a convenient
shortcut, even in cases where no actual testing is performed.</p>
<h1 id="alleged-remaining-differences">Alleged remaining differences?</h1>
<p>Other differences are advertised in the
<a href="http://stevehanov.ca/blog/index.php?id=132">blog post</a>,
most notably that:</p>
<ul>
<li>A/B testing cannot handle more than two options at once.</li>
<li>Variants can be added or removed at any time in the multi-armed bandit
strategy but not in A/B testing.</li>
</ul>
<p>These claims are not accurate:</p>
<ul>
<li>A/B testing does not handle more than two options at once <em>by definition</em>,
however its generalization to any number of variants cannot be more trivial,
either directly (Fisher's exact test is not limited to two options)
or indirectly (by comparing each variant two by two).</li>
<li>Variants can actually be removed and added at any time in both methods
even though this might not be implemented in some tools. In particular,
A/B testing does not <em>rely</em> on even sampling to provide correct results:
it is only a way to avoid some bias and to get the most significant results
out of a given number of trials.</li>
</ul>
<p><em>So, we are going to compare two methods: A/B testing and the multi-armed
bandit. These methods mainly differ in their data-gathering techniques
that we are going to compare by performing Monte-Carlo simulations.</em></p>
<p><em>Even though we provide a Jupyter notebook containing all the code used to
perform our simulations, we wanted to provide at least the basic
implementations of the models in the <a href="{filename}05-creating_models.md">next post</a>.
People wanting raw results can jump directly
<a href="{filename}06-pros_of_AB.md">further</a>
to the exploitation of these models.</em></p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
var location_protocol = (false) ? 'https' : document.location.protocol;
if (location_protocol !== 'http' && location_protocol !== 'https') location_protocol = 'https:';
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = location_protocol + '//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>
</div><!-- /.entry-content -->
</article>
</section>
<section id="extras" class="body">
</section><!-- /#extras -->
<footer id="contentinfo" class="body">
<address id="about" class="vcard body">
Proudly powered by <a href="http://getpelican.com/">Pelican</a>, which takes great advantage of <a href="http://python.org">Python</a>.
</address><!-- /#about -->
<p>The theme is by <a href="http://coding.smashingmagazine.com/2009/08/04/designing-a-html-5-layout-from-scratch/">Smashing Magazine</a>, thanks!</p>
</footer><!-- /#contentinfo -->
</body>
</html>