diff --git a/.nojekyll b/.nojekyll index 01d1fc7..17eff7e 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -dc5d3c6f \ No newline at end of file +a552ad32 \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml index eb8aa62..9902a5a 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,46 +2,46 @@ https://uw-psych.github.io/psych532-slides/slides/07-fair.html - 2024-04-21T04:35:37.958Z + 2024-04-22T11:07:40.303Z https://uw-psych.github.io/psych532-slides/slides/06-machine-learning.html - 2024-04-21T04:35:37.958Z + 2024-04-22T11:07:40.303Z https://uw-psych.github.io/psych532-slides/slides/08-viz.html - 2024-04-21T04:35:37.958Z + 2024-04-22T11:07:40.303Z https://uw-psych.github.io/psych532-slides/slides/05-hpc.html - 2024-04-21T04:35:37.958Z + 2024-04-22T11:07:40.303Z https://uw-psych.github.io/psych532-slides/slides/02-reproducibility.html - 2024-04-21T04:35:37.958Z + 2024-04-22T11:07:40.303Z https://uw-psych.github.io/psych532-slides/about.html - 2024-04-21T04:35:37.958Z + 2024-04-22T11:07:40.303Z https://uw-psych.github.io/psych532-slides/index.html - 2024-04-21T04:35:37.958Z + 2024-04-22T11:07:40.303Z https://uw-psych.github.io/psych532-slides/slides/01-introduction.html - 2024-04-21T04:35:37.958Z + 2024-04-22T11:07:40.303Z https://uw-psych.github.io/psych532-slides/slides/04-stats-with-big-data.html - 2024-04-21T04:35:37.958Z + 2024-04-22T11:07:40.303Z https://uw-psych.github.io/psych532-slides/slides/09-ethics.html - 2024-04-21T04:35:37.958Z + 2024-04-22T11:07:40.303Z https://uw-psych.github.io/psych532-slides/slides/03-working-with-big-data.html - 2024-04-21T04:35:37.958Z + 2024-04-22T11:07:40.303Z diff --git a/slides/04-stats-with-big-data.html b/slides/04-stats-with-big-data.html index 94707c9..d14f9db 100644 --- a/slides/04-stats-with-big-data.html +++ b/slides/04-stats-with-big-data.html @@ -334,15 +334,15 @@

Doing statistics with big data

Statistics with Big Data

@@ -385,7 +385,7 @@

The Bayesian objection

  • But inference is often.
  • Which may or may not be true depending on the prior of \(H_0\).
  • Making \(\alpha = 0.05\) even more arbitrary.
  • @@ -440,10 +440,29 @@

    Explicit models

    +
    +

    Some challenges

    +
    + +
    +
    +

    Computing to the rescue

    -

    +

    @@ -459,10 +478,19 @@

    Resampling methods

    The Jackknife

    +
    +
    +
    + +
    +

    The Jackknife

    +
    +
  • The estimate of the standard error \(SE(S)\) is:
      -
    • $SE_= $
    • +
    • \(SE_\theta = \sqrt{ \frac{n-1}{n} \sum_{i}{ (\hat{\theta} - \theta_i) ^2 }}\)
  • +
    -
    +

    The jackknife

    +
    • The bias of the jackknife is smaller than the bias of \(\theta\) (why?)
    • Can also be used to estimate the bias of \(\theta\): @@ -490,6 +520,7 @@

      The jackknife

    • \(\hat{B} = \hat{\theta} - \theta\)
    +
    @@ -501,7 +532,7 @@

    Demo

    Some limitations

    +
    +

    Further reading

    + +
    +

    The curse of dimensionality

    What about large \(p\)?

    @@ -583,22 +657,48 @@

    The curse of dimensionality

    Data is sparser in higher dimensions

    -
    +

    The distance between points increases rapidly

    +
    + +
    +

    Multi co-linearity

    + +
    +

    \[ +\begin{bmatrix} +X_{11} & X_{12} & \cdots & X_{1p} \\ +X_{21} & X_{22} & \cdots & X_{2p} \\ +\vdots & \vdots & \ddots & \vdots \\ +X_{n1} & X_{n2} & \cdots & X_{np} +\end{bmatrix} +\]

    +
    +
    +

    every column is a linear combination of other columns.

    +

    When \(p\) > \(n\) multi-colinearity exists

    - +

    That is, there exists \(\beta\) such that

    +
    +

    \(X_{j} = \sum{\beta_j X_{-j}}\)

    +
    +
    +

    But multi-colinearity can exist even when \(p\) < \(n\) !

    +

    The false positive rate increases

    -
    +

    Machine learning to the rescue?

    diff --git a/slides/images/cod_distance.png b/slides/images/cod_distance.png new file mode 100644 index 0000000..58f8ab3 Binary files /dev/null and b/slides/images/cod_distance.png differ diff --git a/slides/images/cod_sparse.png b/slides/images/cod_sparse.png new file mode 100644 index 0000000..2d256a7 Binary files /dev/null and b/slides/images/cod_sparse.png differ diff --git a/slides/images/code_false_positives.png b/slides/images/code_false_positives.png new file mode 100644 index 0000000..a56fbf2 Binary files /dev/null and b/slides/images/code_false_positives.png differ