Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Apr 22, 2024
1 parent 6f992d9 commit 121a2bc
Show file tree
Hide file tree
Showing 6 changed files with 127 additions and 27 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
dc5d3c6f
a552ad32
22 changes: 11 additions & 11 deletions sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,46 +2,46 @@
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://uw-psych.github.io/psych532-slides/slides/07-fair.html</loc>
<lastmod>2024-04-21T04:35:37.958Z</lastmod>
<lastmod>2024-04-22T11:07:40.303Z</lastmod>
</url>
<url>
<loc>https://uw-psych.github.io/psych532-slides/slides/06-machine-learning.html</loc>
<lastmod>2024-04-21T04:35:37.958Z</lastmod>
<lastmod>2024-04-22T11:07:40.303Z</lastmod>
</url>
<url>
<loc>https://uw-psych.github.io/psych532-slides/slides/08-viz.html</loc>
<lastmod>2024-04-21T04:35:37.958Z</lastmod>
<lastmod>2024-04-22T11:07:40.303Z</lastmod>
</url>
<url>
<loc>https://uw-psych.github.io/psych532-slides/slides/05-hpc.html</loc>
<lastmod>2024-04-21T04:35:37.958Z</lastmod>
<lastmod>2024-04-22T11:07:40.303Z</lastmod>
</url>
<url>
<loc>https://uw-psych.github.io/psych532-slides/slides/02-reproducibility.html</loc>
<lastmod>2024-04-21T04:35:37.958Z</lastmod>
<lastmod>2024-04-22T11:07:40.303Z</lastmod>
</url>
<url>
<loc>https://uw-psych.github.io/psych532-slides/about.html</loc>
<lastmod>2024-04-21T04:35:37.958Z</lastmod>
<lastmod>2024-04-22T11:07:40.303Z</lastmod>
</url>
<url>
<loc>https://uw-psych.github.io/psych532-slides/index.html</loc>
<lastmod>2024-04-21T04:35:37.958Z</lastmod>
<lastmod>2024-04-22T11:07:40.303Z</lastmod>
</url>
<url>
<loc>https://uw-psych.github.io/psych532-slides/slides/01-introduction.html</loc>
<lastmod>2024-04-21T04:35:37.958Z</lastmod>
<lastmod>2024-04-22T11:07:40.303Z</lastmod>
</url>
<url>
<loc>https://uw-psych.github.io/psych532-slides/slides/04-stats-with-big-data.html</loc>
<lastmod>2024-04-21T04:35:37.958Z</lastmod>
<lastmod>2024-04-22T11:07:40.303Z</lastmod>
</url>
<url>
<loc>https://uw-psych.github.io/psych532-slides/slides/09-ethics.html</loc>
<lastmod>2024-04-21T04:35:37.958Z</lastmod>
<lastmod>2024-04-22T11:07:40.303Z</lastmod>
</url>
<url>
<loc>https://uw-psych.github.io/psych532-slides/slides/03-working-with-big-data.html</loc>
<lastmod>2024-04-21T04:35:37.958Z</lastmod>
<lastmod>2024-04-22T11:07:40.303Z</lastmod>
</url>
</urlset>
130 changes: 115 additions & 15 deletions slides/04-stats-with-big-data.html
Original file line number Diff line number Diff line change
Expand Up @@ -334,15 +334,15 @@ <h1 class="title">Doing statistics with big data</h1>
<section id="statistics-with-big-data" class="title-slide slide level1 center">
<h1>Statistics with Big Data</h1>
<ul>
<li class="fragment">The curse of dimensionality.</li>
<li class="fragment">The challenges of null hypothesis statistical testing.</li>
<li class="fragment">Visualization as a solution.</li>
<li class="fragment">Statistical solutions.
<li class="fragment">Some challenges in standard data analysis methods.</li>
<li class="fragment">Some solutions.</li>
<li class="fragment">Estimating error.
<ul>
<li class="fragment">Resampling.</li>
<li class="fragment">The Jackknife.</li>
<li class="fragment">The Bootstrap.</li>
</ul></li>
<li class="fragment">The curse of dimensionality.</li>
</ul>
</section>

Expand Down Expand Up @@ -385,7 +385,7 @@ <h1>The Bayesian objection</h1>
</ul></li>
<li class="fragment">But inference is often.
<ul>
<li class="fragment"><span class="math inline">\(p(H_0 | data) is small\)</span>.</li>
<li class="fragment"><span class="math inline">\(p(H_0 | data)\)</span> is small.</li>
</ul></li>
<li class="fragment">Which may or may not be true depending on the prior of <span class="math inline">\(H_0\)</span>.</li>
<li class="fragment">Making <span class="math inline">\(\alpha = 0.05\)</span> even more arbitrary.</li>
Expand Down Expand Up @@ -440,10 +440,29 @@ <h1>Explicit models</h1>
</ul>
</section>

<section id="some-challenges" class="title-slide slide level1 center">
<h1>Some challenges</h1>
<div class="small">
<ul>
<li class="fragment">To calculate error bars, we need an estimate of the standard error of the statistic.</li>
<li class="fragment">For simple cases, this is derived from the variance of the sampling distribution.
<ul>
<li class="fragment">What is the variance across multiple samples of size <span class="math inline">\(n\)</span>.</li>
</ul></li>
<li class="fragment">For some statistics (and with some assumptions), we can calculate this.
<ul>
<li class="fragment">For example, the variance of the sampling distribution of the mean: <span class="math inline">\(\frac{\sigma}{\sqrt(n)}\)</span></li>
</ul></li>
<li class="fragment">For many statistics the sampling distribution is not well defined</li>
<li class="fragment">But it can be computed empirically</li>
</ul>
</div>
</section>

<section id="computing-to-the-rescue" class="title-slide slide level1 center">
<h1>Computing to the rescue</h1>
<div class="fragment">
<p><img data-src="./images/computers_in_1983.png"></p>
<p><img data-src="./images/computers_in_1983.png" height="500"></p>
</div>
</section>

Expand All @@ -459,10 +478,19 @@ <h1>Resampling methods</h1>

<section id="the-jackknife" class="title-slide slide level1 center">
<h1>The Jackknife</h1>
<div class="small">
<ul>
<li class="fragment">Originally invented by statistician Maurice Quenouille in the 40’s.</li>
<li class="fragment">Championed by Tukey, who also named it for its versatility and utility.</li>
<li class="fragment">The mechanics:</li>
</ul>
</div>
</section>

<section id="the-jackknife-1" class="title-slide slide level1 center">
<h1>The Jackknife</h1>
<div class="small">
<ul>
<li class="fragment">The algorithm:</li>
<li class="fragment">Consider the statistic <span class="math inline">\(\theta(X)\)</span> calculated for data set <span class="math inline">\(X\)</span></li>
<li class="fragment">Let the sample size of the data be <span class="math inline">\(X\)</span> be <span class="math inline">\(n\)</span></li>
<li class="fragment">For i in 1…<span class="math inline">\(n\)</span>
Expand All @@ -476,20 +504,23 @@ <h1>The Jackknife</h1>
</ul></li>
<li class="fragment">The estimate of the standard error <span class="math inline">\(SE(S)\)</span> is:
<ul>
<li class="fragment">$SE_= $</li>
<li class="fragment"><span class="math inline">\(SE_\theta = \sqrt{ \frac{n-1}{n} \sum_{i}{ (\hat{\theta} - \theta_i) ^2 }}\)</span></li>
</ul></li>
</ul>
</div>
</section>

<section id="the-jackknife-1" class="title-slide slide level1 center">
<section id="the-jackknife-2" class="title-slide slide level1 center">
<h1>The jackknife</h1>
<div class="small">
<ul>
<li class="fragment">The bias of the jackknife is smaller than the bias of <span class="math inline">\(\theta\)</span> (why?)</li>
<li class="fragment">Can also be used to estimate the bias of <span class="math inline">\(\theta\)</span>:
<ul>
<li class="fragment"><span class="math inline">\(\hat{B} = \hat{\theta} - \theta\)</span></li>
</ul></li>
</ul>
</div>
</section>

<section id="demo" class="title-slide slide level1 center">
Expand All @@ -501,7 +532,7 @@ <h1>Demo</h1>
<h1>Some limitations</h1>
<ul>
<li class="fragment">Assumes data is IID</li>
<li class="fragment">Assumes that <span class="math inline">\(\theta\)</span> is $ (,,^{2}) $</li>
<li class="fragment">Assumes that <span class="math inline">\(\theta\)</span> is <span class="math inline">\(\sim \mathcal{N}(\mu,\,\sigma^{2})\)</span></li>
<li class="fragment">Can fail badly with non-smooth estimators (e.g., median)</li>
<li class="fragment">We’ll talk about cross-validation next week.</li>
<li class="fragment">And we may or may not come back to permutations later on.</li>
Expand All @@ -510,13 +541,45 @@ <h1>Some limitations</h1>

<section id="the-bootstrap" class="title-slide slide level1 center">
<h1>The bootstrap</h1>
<p>Invented by Bradley Efron - See <a href="https://youtu.be/0tA3x64nCGY?si=u_9syHVAkwlcea9V">interview</a> for the back-story. - Very general in its application - Consider a statistic <span class="math inline">\(\theta(X)\)</span> - For i in <span class="math inline">\(1...b\)</span> - Sample <span class="math inline">\(n\)</span> samples <em>with replacement</em>: <span class="math inline">\(X_b\)</span> - In the pseudo-sample, calculate <span class="math inline">\(\theta(X_b)\)</span> and store the value - Standard error is the sample standard deviation of <span class="math inline">\(\theta\)</span>: - <span class="math inline">\(\sqrt(\frac{1}{n-1} \sum_i{(\theta - \bar{\theta})^2})\)</span> - Bias can be estimated as: - <span class="math inline">\(\theta{X} - \bar{theta}\)</span> (why?) - The 95% confidence interval is in the interval between 2.5 and 97.5.</p>
<p>Invented by Bradley Efron</p>
<ul>
<li class="fragment">See <a href="https://youtu.be/0tA3x64nCGY?si=u_9syHVAkwlcea9V">interview</a> for the back-story.</li>
<li class="fragment">Very general in its application</li>
</ul>
</section>

<section id="the-bootstrap-1" class="title-slide slide level1 center">
<h1>The bootstrap</h1>
<div class="small">
<ul>
<li class="fragment">The algorithm:</li>
<li class="fragment">Consider a statistic <span class="math inline">\(\theta(X)\)</span></li>
<li class="fragment">For i in <span class="math inline">\(1...b\)</span>
<ul>
<li class="fragment">Sample <span class="math inline">\(n\)</span> samples <em>with replacement</em>: <span class="math inline">\(X_b\)</span></li>
<li class="fragment">In the pseudo-sample, calculate <span class="math inline">\(\theta(X_b)\)</span> and store the value</li>
</ul></li>
<li class="fragment">Standard error is the sample standard deviation of <span class="math inline">\(\theta\)</span>:
<ul>
<li class="fragment"><span class="math inline">\(\sqrt{\frac{1}{n-1} \sum_i{(\theta - \bar{\theta})^2}}\)</span></li>
</ul></li>
<li class="fragment">Bias can be estimated as:
<ul>
<li class="fragment"><span class="math inline">\(\theta(X) - \bar{\theta}\)</span> (why?)</li>
</ul></li>
<li class="fragment">The 95% confidence interval is in the interval between 2.5 and 97.5.</li>
</ul>
</div>
</section>

<section id="why-is-the-bootstrap-so-effective" class="title-slide slide level1 center">
<h1>Why is the bootstrap so effective?</h1>
<div class="small">
<ul>
<li class="fragment">Alleviates distributional assumptions required with other methods.
<ul>
<li class="fragment">Alleviates distributional assumptions required with other methods.</li>
<li class="fragment">“non-parametric”</li>
</ul></li>
<li class="fragment">Flexible to the statistic that is being interrogated</li>
<li class="fragment">Allows interrogating sampling procedures
<ul>
Expand All @@ -526,6 +589,7 @@ <h1>Why is the bootstrap so effective?</h1>
<li class="fragment">And other complex procedures.</li>
<li class="fragment">Efron argues that this is the natural procedure Fisher et al.&nbsp;would have preferred in the 20’s if they had computers.</li>
</ul>
</div>
</section>

<section id="demo-1" class="title-slide slide level1 center">
Expand All @@ -540,6 +604,7 @@ <h1>A few pitfalls of the bootstrap</h1>

<section id="a-few-pitfalls" class="title-slide slide level1 center">
<h1>A few pitfalls</h1>
<div class="small">
<ul>
<li class="fragment">Estimates of SE tend to bias downward in small samples.
<ul>
Expand All @@ -559,6 +624,7 @@ <h1>A few pitfalls</h1>
<li class="fragment">Residuals are preferable when considering a designed experiment with fixed levels of an IV.</li>
</ul></li>
</ul>
</div>
</section>

<section id="building-on-the-bootstrap" class="title-slide slide level1 center">
Expand All @@ -572,6 +638,14 @@ <h1>Building on the bootstrap</h1>
</ul>
</section>

<section id="further-reading" class="title-slide slide level1 center">
<h1>Further reading</h1>
<ul>
<li class="fragment"><p>John Fox &amp; Sanford Weisberg have an excellent <a href="https://socialsciences.mcmaster.ca/jfox/Books/Companion-2E/appendix/Appendix-Bootstrapping.pdf">chapter</a> about “bootstrapping regression models” that has some excellent explanations and R code.</p></li>
<li class="fragment"><p>Another set of explanations in a Kulesa et al.&nbsp;<a href="https://www.nature.com/articles/nmeth.3414">tutorial paper</a>.</p></li>
</ul>
</section>

<section id="the-curse-of-dimensionality" class="title-slide slide level1 center">
<h1>The curse of dimensionality</h1>
<p>What about large <span class="math inline">\(p\)</span>?</p>
Expand All @@ -583,22 +657,48 @@ <h1>The curse of dimensionality</h1>
<section id="data-is-sparser-in-higher-dimensions" class="title-slide slide level1 center">
<h1>Data is sparser in higher dimensions</h1>

</section>
<img data-src="./images/cod_sparse.png" class="r-stretch"></section>

<section id="the-distance-between-points-increases-rapidly" class="title-slide slide level1 center">
<h1>The distance between points increases rapidly</h1>

<img data-src="./images/cod_distance.png" class="r-stretch"></section>

<section id="multi-co-linearity" class="title-slide slide level1 center">
<h1>Multi co-linearity</h1>
<ul>
<li class="fragment">If the data is a <span class="math inline">\(n\)</span>-by-<span class="math inline">\(p\)</span> matrix:</li>
</ul>
<div class="fragment">
<p><span class="math display">\[
\begin{bmatrix}
X_{11} &amp; X_{12} &amp; \cdots &amp; X_{1p} \\
X_{21} &amp; X_{22} &amp; \cdots &amp; X_{2p} \\
\vdots &amp; \vdots &amp; \ddots &amp; \vdots \\
X_{n1} &amp; X_{n2} &amp; \cdots &amp; X_{np}
\end{bmatrix}
\]</span></p>
</div>
<div class="fragment">
<p>every column is a linear combination of other columns.</p>
</div>
</section>

<section id="when-p-n-multi-colinearity-exists" class="title-slide slide level1 center">
<h1>When <span class="math inline">\(p\)</span> &gt; <span class="math inline">\(n\)</span> multi-colinearity exists</h1>

<p>That is, there exists <span class="math inline">\(\beta\)</span> such that</p>
<div class="fragment">
<p><span class="math inline">\(X_{j} = \sum{\beta_j X_{-j}}\)</span></p>
</div>
<div class="fragment">
<p>But multi-colinearity can exist even when <span class="math inline">\(p\)</span> &lt; <span class="math inline">\(n\)</span> !</p>
</div>
</section>

<section id="the-false-positive-rate-increases" class="title-slide slide level1 center">
<h1>The false positive rate increases</h1>

</section>
<img data-src="./images/code_false_positives.png" class="r-stretch"></section>

<section id="machine-learning-to-the-rescue" class="title-slide slide level1 center">
<h1>Machine learning to the rescue?</h1>
Expand Down
Binary file added slides/images/cod_distance.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added slides/images/cod_sparse.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added slides/images/code_false_positives.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 121a2bc

Please sign in to comment.