Expectation Maximization (EM)
EM solves a maximum likelihood problem of the form,
@@ -236,12 +236,11 @@Jensen’s Inequality
-Let’s revisit Jensen’s inequality, for concave functions.
Suppose $f : \mathbb{R} \mapsto \mathbb{R}$ is concave, then for all probability distributions $p$, we have,
$$ -f(\mathbb{E}{\mathbf{x} \sim p}[x]) \geq \mathbb{E}{\mathbf{x} \sim p}[f(x)] +f(\mathbb{E} [\mathbf{x}]) \geq \mathbb{E} [f(\mathbf{x})]. $$
-The subscript $\mathbf{x} \sim p$ indicates that the expectation is taken with respect to the random variable $\mathbf{x}$ drawn from the probability distribution $p$.
+Where the expectation is taken with respect to the random variable $\mathbf{x}$ drawn from the probability distribution $p$.
The equality holds if and only if
- $\mathbf{x}$ is a constant, or, @@ -397,7 +396,7 @@
Linear Dimensionality Reduction
$$ \mathbf{x}^{(i)} = \sum_{k = 1}^K z_k^{(i)} \mathbf{b}_k $$
-where $\mathbf{b}k = [b{1k}, b_{2k}, \ldots, b_{Nk}]^T$ is a basis vector and $z_k^{(i)} \in \mathbb{R}$ is the corresponding weight.
+where $\mathbf{b}_k$ is a basis vector and $z_k^{(i)} \in \mathbb{R}$ is the corresponding weight.
Connection to Linear Regression
If we focus on the $j$-th entry of $\mathbf{x}^{(i)}$, we have,
$$ @@ -477,11 +476,11 @@
Alternating Least Squares
Lack of Uniqueness for Optimal Parameters
-Suppose we run the ALS algorithm to convergence and obtain optimal parameters $\mathbf{Z}^{\star}$ and $\mathbf{B}^{\star}$ such that, -$$ +
Suppose we run the ALS algorithm to convergence and obtain optimal parameters $\mathbf{Z}^{\star}$ and $\mathbf{B}^{\star}$ such that,
+$$ \ell^{\star} = \Vert \mathbf{X} - \mathbf{Z}^{\star} \mathbf{B}^{\star} \Vert_F^2 $$
-Let $\mathbf{R} \in \mathbb{R}^{K \times K$ be an arbitrary invertible matrix.
+Let $\mathbf{R} \in \mathbb{R}^{K \times K}$ be an arbitrary invertible matrix.
A $K \times K$ matrix $\mathbf{R}$ is invertible, if there exists a $K \times K$ matrix $\mathbf{S}$ such that $\mathbf{R} \mathbf{S} = \mathbf{S} \mathbf{R} = \mathbf{I}$. Which we also can denote as $\mathbf{R}^{-1} = \mathbf{S}$.
We obtain a different set of parameters $\mathbf{\tilde{Z}} = \mathbf{Z}^{\star} \mathbf{R}$ and $\mathbf{\tilde{B}} = \mathbf{R}^{-1} \mathbf{B}^{\star}$, with the same optimal value, diff --git a/cityu/ma3518/ma3518_3/index.html b/cityu/ma3518/ma3518_3/index.html index 269f26c3..981b0e0c 100644 --- a/cityu/ma3518/ma3518_3/index.html +++ b/cityu/ma3518/ma3518_3/index.html @@ -569,7 +569,7 @@