-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbasics-notations.html
203 lines (192 loc) · 11.3 KB
/
basics-notations.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>The basics of probabilities and mathematical notations</title>
<link rel="stylesheet" href="/theme/css/main.css" />
<!--[if IE]>
<script src="https://html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body id="index" class="home">
<header id="banner" class="body">
<h1><a href="/">And yet it moves! </a></h1>
<nav><ul>
<li class="active"><a href="/category/data-processing.html">Data processing</a></li>
</ul></nav>
</header><!-- /#banner -->
<section id="content" class="body">
<article>
<header>
<h1 class="entry-title">
<a href="/basics-notations.html" rel="bookmark"
title="Permalink to The basics of probabilities and mathematical notations">The basics of probabilities and mathematical notations</a></h1>
</header>
<div class="entry-content">
<footer class="post-info">
<abbr class="published" title="2016-04-28T16:36:00+01:00">
Published: jeu. 28 avril 2016
</abbr>
<address class="vcard author">
By <a class="url fn" href="/author/gael.html">Gaël</a>
</address>
<p>In <a href="/category/data-processing.html">Data processing</a>.</p>
</footer><!-- /.post-info --> <p>This post is the first technical post of the <a href="{filename}01-index.md">series</a>.</p>
<p><em>Website optimization
is the concept of altering a website in order to favor a given behavior and
maximize the <strong>reward</strong> we get out of the website
(e.g. selling more stuff). This alteration could be a complete
redesign or some refinements over an existing design (like changing the color
of a button).</em></p>
<p><em>From the rigorist point of view, the only way to be absolutely sure that one
variant actually performs better than the other (i.e. yields the maximal
reward) is to serve the first variant until the death of the website,
get back into time and serve only the second one. That seems ridiculous
but this would be the only way to get an <strong>exact</strong> knowledge.</em></p>
<p><em>Of course, in practice, time travel is not possible and it is not even needed:
if we can afford to be wrong, sometimes, it is even possible to assess which
variant is better with a very limited number of impressions (number of times
the website is served). In that case, we can keep on using the best-performing
variant only as long as we want. Such an assessment is performed through
<strong>statistics</strong>.</em></p>
<h1 id="statistics-events-outcomes-population-and-sample-size">Statistics: events, outcomes, population and sample size</h1>
<p>People (among other things), tend to react in a predictable way <em>on average</em>
to a given "thing happening":
e.g. people often carry an umbrella on a rainy day.</p>
<p>The "thing happening" is called an <strong>event</strong> and the "reaction"
is called a result or an <strong>outcome</strong>.
So, given a particular <em>event</em>, one can <em>measure</em> a particular result
(or <em>outcome</em>):</p>
<blockquote>
<p>on the event <em>going out on a rainy day</em>,
one can measure the outcome <em>number of persons carrying an
umbrella</em>. </p>
</blockquote>
<p>Similarly, one could measure the same outcome on the different event
<em>sunny day</em>.</p>
<p>Each time the outcome is measured, whether or not that outcome is positive,
is called a <strong>trial</strong> (or an <em>impression</em> in the web parlance as we said in the
introduction). These trials are carried out in a subpart of the total
<strong>population</strong> that we want to study (in the context of the web, the population
would consist in all the users that will ever visit your website at any point in the
future). This subpart is called a <strong>sample</strong> of the population and the number
of element in this sample is called <strong>sample size</strong>.</p>
<h1 id="case-study-optimizing-the-color-of-a-button">Case study: optimizing the color of a button</h1>
<p>Instead of the weather, the blogger decided to study the specific effect of the
color (either orange, green or white) of a <em>Buy Now!</em> button on
its click-through rate (i.e. proportion of people clicking on the button).</p>
<p>Each user accessing the website would therefore be subjected to one of three
events:</p>
<ul>
<li>either getting the version of the website with the orange button (let's call that
<em>event <span class="math">\(A\)</span></em>, or just <span class="math">\(A\)</span>);</li>
<li>or getting that same website with a green button (<em>event <span class="math">\(B\)</span></em>);</li>
<li>or getting the white button instead (<em>event <span class="math">\(C\)</span></em>).</li>
</ul>
<p>The outcome (called <span class="math">\(O\)</span>) we are interested in is the <em>number of
clicks</em> on that button.</p>
<p>But this raw number is not interesting by itself: we would like to unveil
something deeper, a kind of hidden truth about people behavior in general.
That's why we have to resort to probabilities. So instead of analysing <span class="math">\(O\)</span>, the
number of clicks on the <em>Buy Now!</em> button, we will study <span class="math">\(P(O)\)</span>, the
probability that a user clicks on the button <strong>given</strong> that the user has
been subjected to either <span class="math">\(A\)</span>, <span class="math">\(B\)</span> or <span class="math">\(C\)</span> (noted, <span class="math">\(P(O|A)\)</span>, <span class="math">\(P(O|B)\)</span> and
<span class="math">\(P(O|C)\)</span>, respectively).</p>
<p>If we assume that these probabilities do not change with time (or they change
uniformly), then only a limited number of impressions is needed to estimate
these probabilities to determine which variant is better. This property is
what allows us to perform a testing campaign first (in order to determine
which variant is better) and then to exploit it in all the ulterior visits
by only serving the <em>optimized</em>, best-performing, variant.</p>
<h1 id="summary">Summary</h1>
<p>To sum up, we defined the following probabilities:</p>
<ul>
<li><span class="math">\(P(A)\)</span>: the probability that a user gets the website version with the
<em>orange</em> button;</li>
<li><span class="math">\(P(B)\)</span>: the probability of getting the <em>green</em> button;</li>
<li><span class="math">\(P(C)\)</span>: the probability of getting the <em>white</em> button;</li>
<li><span class="math">\(P(O)\)</span>: the probability of <strong>any</strong> user clicking on the button (total
click-through rate);</li>
<li><span class="math">\(P(O|A)\)</span>: the probability that a user clicks on the button <strong>given</strong> that
that button is <em>orange</em>;</li>
<li><span class="math">\(P(O|B)\)</span>: the probability of clicking on the button <strong>given</strong> it is <em>green</em>;</li>
<li><span class="math">\(P(O|C)\)</span>: the probability of clicking on the button <strong>given</strong> it is <em>white</em>.</li>
</ul>
<p>These are only notations: a way to express something.
But these notations will be very useful to express ourselves very precisely and
concisely throughout the series.</p>
<p><em>We saw that the statistics allows us to make statements about different
variants of a website with only a limited number of impressions.
However, there are multiple different ways to gather and analyse the
data in a testing campaign: the purpose of this blog series is two compare
two of such methods,
<a href="{filename}04-presentation_methods.md">A/B testing and the multi-armed bandit
strategy</a>.</em></p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
var location_protocol = (false) ? 'https' : document.location.protocol;
if (location_protocol !== 'http' && location_protocol !== 'https') location_protocol = 'https:';
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = location_protocol + '//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>
</div><!-- /.entry-content -->
</article>
</section>
<section id="extras" class="body">
</section><!-- /#extras -->
<footer id="contentinfo" class="body">
<address id="about" class="vcard body">
Proudly powered by <a href="http://getpelican.com/">Pelican</a>, which takes great advantage of <a href="http://python.org">Python</a>.
</address><!-- /#about -->
<p>The theme is by <a href="http://coding.smashingmagazine.com/2009/08/04/designing-a-html-5-layout-from-scratch/">Smashing Magazine</a>, thanks!</p>
</footer><!-- /#contentinfo -->
</body>
</html>