diff --git a/cityu/cs4487/cs4487_7/index.html b/cityu/cs4487/cs4487_7/index.html index abaca57c..eae72725 100644 --- a/cityu/cs4487/cs4487_7/index.html +++ b/cityu/cs4487/cs4487_7/index.html @@ -223,7 +223,7 @@
-Last modified:
+Last modified:
7 min read

Part 7 - The Expectation Maximization Algorithm & Linear Dimensionality Reduction

Table of Contents

Expectation Maximization (EM)

EM solves a maximum likelihood problem of the form,

@@ -236,12 +236,11 @@
  • $\{z^{(i)}\}_{i = 1}^M$: Unobserved latent variables. (e.g., in GMM, $z^{(i)}$ indicates which one of the $K$ clusters $x^{(i)}$ belongs to, which is unobserved.)
  • Jensen’s Inequality

    -

    Let’s revisit Jensen’s inequality, for concave functions.

    Suppose $f : \mathbb{R} \mapsto \mathbb{R}$ is concave, then for all probability distributions $p$, we have,

    $$ -f(\mathbb{E}{\mathbf{x} \sim p}[x]) \geq \mathbb{E}{\mathbf{x} \sim p}[f(x)] +f(\mathbb{E} [\mathbf{x}]) \geq \mathbb{E} [f(\mathbf{x})]. $$

    -

    The subscript $\mathbf{x} \sim p$ indicates that the expectation is taken with respect to the random variable $\mathbf{x}$ drawn from the probability distribution $p$.

    +

    Where the expectation is taken with respect to the random variable $\mathbf{x}$ drawn from the probability distribution $p$.

    The equality holds if and only if

    1. $\mathbf{x}$ is a constant, or,
    2. @@ -397,7 +396,7 @@

      Linear Dimensionality Reduction

      $$ \mathbf{x}^{(i)} = \sum_{k = 1}^K z_k^{(i)} \mathbf{b}_k $$

      -

      where $\mathbf{b}k = [b{1k}, b_{2k}, \ldots, b_{Nk}]^T$ is a basis vector and $z_k^{(i)} \in \mathbb{R}$ is the corresponding weight.

      +

      where $\mathbf{b}_k$ is a basis vector and $z_k^{(i)} \in \mathbb{R}$ is the corresponding weight.

      Connection to Linear Regression

      If we focus on the $j$-th entry of $\mathbf{x}^{(i)}$, we have,

      $$ @@ -477,11 +476,11 @@

      Alternating Least Squares

    Lack of Uniqueness for Optimal Parameters

    -

    Suppose we run the ALS algorithm to convergence and obtain optimal parameters $\mathbf{Z}^{\star}$ and $\mathbf{B}^{\star}$ such that, -$$ +

    Suppose we run the ALS algorithm to convergence and obtain optimal parameters $\mathbf{Z}^{\star}$ and $\mathbf{B}^{\star}$ such that,

    +

    $$ \ell^{\star} = \Vert \mathbf{X} - \mathbf{Z}^{\star} \mathbf{B}^{\star} \Vert_F^2 $$

    -

    Let $\mathbf{R} \in \mathbb{R}^{K \times K$ be an arbitrary invertible matrix.

    +

    Let $\mathbf{R} \in \mathbb{R}^{K \times K}$ be an arbitrary invertible matrix.

    A $K \times K$ matrix $\mathbf{R}$ is invertible, if there exists a $K \times K$ matrix $\mathbf{S}$ such that $\mathbf{R} \mathbf{S} = \mathbf{S} \mathbf{R} = \mathbf{I}$. Which we also can denote as $\mathbf{R}^{-1} = \mathbf{S}$.

    We obtain a different set of parameters $\mathbf{\tilde{Z}} = \mathbf{Z}^{\star} \mathbf{R}$ and $\mathbf{\tilde{B}} = \mathbf{R}^{-1} \mathbf{B}^{\star}$, with the same optimal value, diff --git a/cityu/ma3518/ma3518_3/index.html b/cityu/ma3518/ma3518_3/index.html index 269f26c3..981b0e0c 100644 --- a/cityu/ma3518/ma3518_3/index.html +++ b/cityu/ma3518/ma3518_3/index.html @@ -569,7 +569,7 @@

    Two-sample test

    $$ H_0 : \mu_1 = \mu_2 $$

    -

    We compute the sample means $\bar{x}_1 and $\bar{x}_2$.

    +

    We compute the sample means $\bar{x}_1$ and $\bar{x}_2$.