Skip to content

Commit

Permalink
More on the bootstrap
Browse files Browse the repository at this point in the history
  • Loading branch information
arokem committed Apr 20, 2024
1 parent bfb34e7 commit ca47763
Showing 1 changed file with 22 additions and 18 deletions.
40 changes: 22 additions & 18 deletions slides/04-stats-with-big-data.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Both of these present statistical challenges

# The jackknife

- The bias of the jackknife is smaller than the bias of $\theta$
- The bias of the jackknife is smaller than the bias of $\theta$ (why?)
- Can also be used to estimate the bias of $\theta$:
- $\hat{B} = \hat{\theta} - \theta$

Expand All @@ -77,13 +77,18 @@ Invented by Bradley Efron
- For i in $1...b$
- Sample $n$ samples _with replacement_: $X_b$
- In the pseudo-sample, calculate $\theta(X_b)$ and store the value
- Standard error is the central 68% of the distribution.
- Standard error is the sample standard deviation of $\theta$:
- $\sqrt(\frac{1}{n-1} \sum_i{(\theta - \bar{\theta})^2})$
- Bias can be estimated as:
- $\theta{X} - \bar{theta}$ (why?)
- The 95% confidence interval is in the interval between 2.5 and 97.5.

# Why is the bootstrap so effective?

- Alleviates distributional assumptions required with other methods.
- Flexible to the statistic that is being interrogated
- Allows interrogating sampling procedures
- For example, sample with and without stratification and compare SE.
- Supports model fitting.
- And other complex procedures.
- Efron argues that this is the natural procedure Fisher et al. would have preferred in the 20's if they had computers.
Expand All @@ -92,34 +97,33 @@ Invented by Bradley Efron

::: {.fragment}

They are talking about this:
He's talking about this:

![](./images/computers_in_1983.png)

:::

# Demo

# Building on the bootstrap

- Ensemble methods:
- [Bagging (bootstrap aggregation)](https://link.springer.com/article/10.1007/BF00058655)
- [Random forests](https://link.springer.com/article/10.1023/A:1010933404324)


# A few pitfalls of the bootstrap

Based on ["What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4784504/)
by Tim Hesterberg.

# A few pitfalls to know about

- Inaccurate confidence intervals
- Particularly for small sample sizes
- In samples with less than


# A few pitfalls to know about
# A few pitfalls

- Estimates of SE tend to bias downward in small samples.
- By a factor of $\sqrt\frac{n-1}{n}$
- $b$ is a meta-parameter that needs to be determined
- Efron originally claimed that $b=1,000$ should suffice
- Hesterberg says at least 15k is required to have a 95% of being within 10% of ground truth p-values.
- Comparing distributions by comparing their 95% CI.
- Should compare the distribution of sampled differences instead!
- In modeling: bootstrapping observations rather than bootstrapping the residuals
- Residuals are preferable when considering a designed experiment with fixed levels of an IV.

# Building on the bootstrap

- Ensemble methods:
- [Bagging (bootstrap aggregation)](https://link.springer.com/article/10.1007/BF00058655)
- [Random forests](https://link.springer.com/article/10.1023/A:1010933404324)

0 comments on commit ca47763

Please sign in to comment.