PRML $(2.61)$

derivation

Posted by Kwan on June 5, 2020

We want to prove that

\[\frac{1}{(2 \pi)^{D / 2}} \frac{1}{|\mathbf{\Sigma}|^{1 / 2}} \int \exp \left\lbrace-\frac{1}{2} \mathbf{z}^{\mathrm{T}} \mathbf{\Sigma}^{-1} \mathbf{z}\right\rbrace \mathbf{z} \mathbf{z}^{\mathrm{T}} \mathrm{d} \mathbf{z} = \mathbf{\Sigma} \tag{1}\label{qwq}\]

First, according to (2.60), $\mathbf{z}=\sum_{j=1}^{D} y_{j} \mathbf{u}_ {j}=\begin{bmatrix}\mathbf{u}_ {1} & \mathbf{u}_ {2} & \cdots & \mathbf{u}_ {D} \end{bmatrix}\begin{bmatrix}{y}_ {1} \\ {y}_ {2} \\ \vdots \\ {y}_ {D} \end{bmatrix}=\mathbf{U}^{\mathrm{T}}\mathbf{y}$, where $\mathbf{U}$ is a matrix whose rows are given by $\mathbf{u}_ {i}^{\mathrm{T}}$ and $y_ {j}=\mathbf{u}_ {j}^{\mathrm{T}} \mathbf{z}$. This process is a coordinate transformation(and it coincides with PCA since $\mathbf{u}_ {i}$s are eigenvectors of the covariance matrix), and we can treat this as a change of variable, i.e., $\mathbf{z}=f(\mathbf{y})=\mathbf{U}^{\mathrm{T}}\mathbf{y}$. The Jacobian of such a transfomation is:

\[\mathbf{J}=\frac{\partial (z_{1}, z_{2}, \ldots,z_{D})}{\partial (y_{1}, y_{2}, \ldots,y_{D})}=\frac{\partial \mathbf{z}}{\partial \mathbf{y}}=\frac{\partial \mathbf{U}^{\mathrm{T}}\mathbf{y}}{\partial \mathbf{y}}=\mathbf{U}.\]

Hence, we have $\mathrm{d} \mathbf{z}=|\text{det}(\mathbf{J})|\mathrm{d} \mathbf{y}$. We’ve known from the previous part of this section that $\mathbf{U}$ is orthogonal, and hence $\left|\mathbf{U}\right|^{2}=\left|\mathbf{U}\right||\mathbf{U}|=\left|\mathbf{U}^{\mathrm{T}}\right||\mathbf{U}|=\left|\mathbf{U}^{\mathrm{T}} \mathbf{U}\right|=|\mathbf{I}|=1$, so $|\text{det}(\mathbf{J})|=|\text{det}(\mathbf{U})|=|\pm1|=1$, i.e., $\mathrm{d} \mathbf{z}=\mathrm{d} \mathbf{y}$.

We could now deal with the integral part of $\eqref{qwq}$. Insert $\mathbf{z}=\sum_{j=1}^{D} y_{j} \mathbf{u}_ {j}$, $\boldsymbol{\Sigma}^ {-1}=\sum_ {i=1}^ {D} \frac{1}{\lambda_ {i}} \mathbf{u}_ {i} \mathbf{u}_ {i}^ {\mathrm{T}}$(2.49) and $\mathrm{d} \mathbf{z}=\mathrm{d} \mathbf{y}$ into the integral part, we get:

\[\begin{aligned} &\hspace{1.4em} \int \exp \left\lbrace-\frac{1}{2} \mathbf{z}^{\mathrm{T}} \mathbf{\Sigma}^{-1} \mathbf{z}\right\rbrace \mathbf{z} \mathbf{z}^{\mathrm{T}} \mathrm{d} \mathbf{z} \\ & = \int \exp \left\lbrace-\frac{1}{2} \sum_{k=1}^{D} y_{k} \mathbf{u}_{k}^{\mathrm{T}} \sum_{a=1}^{D} \frac{1}{\lambda_{a}} \mathbf{u}_{a} \mathbf{u}_{a}^{\mathrm{T}} \sum_{b=1}^{D} y_{b} \mathbf{u}_{b}\right\rbrace \sum_{i=1}^{D} y_{i} \mathbf{u}_{i} \sum_{j=1}^{D} y_{j} \mathbf{u}_{j}^{\mathrm{T}} \mathrm{d} \mathbf{y} \\& = \int \sum_{i=1}^{D} \sum_{j=1}^{D}\mathbf{u}_{i}\mathbf{u}_{j}^{\mathrm{T}}y_{i}y_{j}\exp \left\lbrace-\frac{1}{2} \sum_{k=1}^{D}\sum_{a=1}^{D} \frac{y_{k}}{\lambda_{a}} \mathbf{u}_{k}^{\mathrm{T}} \mathbf{u}_{a} \mathbf{u}_{a}^{\mathrm{T}} \sum_{b=1}^{D} y_{b} \mathbf{u}_{b}\right\rbrace\mathrm{d} \mathbf{y} \\& = \sum_{i=1}^{D} \sum_{j=1}^{D}\mathbf{u}_{i}\mathbf{u}_{j}^{\mathrm{T}} \int \exp \left\lbrace-\frac{1}{2} \sum_{k=1}^{D}\sum_{a=1}^{D} \frac{y_{k}}{\lambda_{a}} \mathbf{u}_{k}^{\mathrm{T}} \mathbf{u}_{a} \mathbf{u}_{a}^{\mathrm{T}} \sum_{b=1}^{D} y_{b} \mathbf{u}_{b}\right\rbrace y_{i}y_{j} \mathrm{d} \mathbf{y}.\end{aligned}\]

Since $\mathbf{U}$ is orthogonal(so is $\mathbf{U}^{\mathrm{T}}$), we know that $\mathbf{u}_ {i}$s are perpendicular to each other, which means, $\mathbf{u}_ {i}^{\mathrm{T}}\mathbf{u}_ {j} = 0$ for all $i \neq j$ and $1$ otherwise. We could henceforth cancel many terms in exponent above using this property:

\[\begin{aligned} &\hspace{1.4em} \sum_{k=1}^{D}\sum_{a=1}^{D} \frac{y_{k}}{\lambda_{a}} \mathbf{u}_{k}^{\mathrm{T}} \mathbf{u}_{a} \mathbf{u}_{a}^{\mathrm{T}} \sum_{b=1}^{D} y_{b} \mathbf{u}_{b} \\& =\sum_{k=1}^{D}\left(\frac{y_{k}}{\lambda_{1}}\underbrace{\mathbf{u}_{k}^{\mathrm{T}} \mathbf{u}_{1}}_0\mathbf{u}_{1}^{\mathrm{T}}+\ldots+\frac{y_{k}}{\lambda_{k}}\underbrace{\mathbf{u}_{k}^{\mathrm{T}} \mathbf{u}_{k}}_1\mathbf{u}_{k}^{\mathrm{T}}+\ldots+\frac{y_{k}}{\lambda_{D}}\underbrace{\mathbf{u}_{k}^{\mathrm{T}} \mathbf{u}_{D}}_0\mathbf{u}_{D}^{\mathrm{T}}\right)\sum_{b=1}^{D} y_{b} \mathbf{u}_{b} \\& = \sum_{k=1}^{D}\frac{y_{k}}{\lambda_{k}}\mathbf{u}_{k}^{\mathrm{T}}\sum_{b=1}^{D} y_{b} \mathbf{u}_{b} \\& = \sum_{k=1}^{D}\sum_{b=1}^{D}\frac{y_{k}y_{b}}{\lambda_{k}}\mathbf{u}_{k}^{\mathrm{T}}\mathbf{u}_{b} \\& = \sum_{k=1}^{D}\frac{y_{k}^2}{\lambda_{k}}.\end{aligned}\]

We now apply the above to the integral part and get:

\[\begin{aligned} &\hspace{1.4em}\sum_{i=1}^{D} \sum_{j=1}^{D}\mathbf{u}_{i}\mathbf{u}_{j}^{\mathrm{T}} \int \exp \left\lbrace-\frac{1}{2} \sum_{k=1}^{D}\sum_{a=1}^{D} \frac{y_{k}}{\lambda_{a}} \mathbf{u}_{k}^{\mathrm{T}} \mathbf{u}_{a} \mathbf{u}_{a}^{\mathrm{T}} \sum_{b=1}^{D} y_{b} \mathbf{u}_{b}\right\rbrace y_{i}y_{j} \mathrm{d} \mathbf{y} \\& = \sum_{i=1}^{D} \sum_{j=1}^{D}\mathbf{u}_{i}\mathbf{u}_{j}^{\mathrm{T}} \int \exp\left\lbrace-\sum_{k=1}^{D}\frac{y_{k}^2}{2\lambda_{k}}\right\rbrace y_{i}y_{j} \mathrm{d} \mathbf{y} \\& = \sum_{i=1}^{D} \sum_{j=1}^{D}\mathbf{u}_{i}\mathbf{u}_{j}^{\mathrm{T}} \int \ldots \int \exp\left\lbrace-\frac{y_{1}^2}{2\lambda_{1}}\right\rbrace \cdots \exp\left\lbrace-\frac{y_{i}^2}{2\lambda_{i}}\right\rbrace \ldots \exp\left\lbrace-\frac{y_{j}^2}{2\lambda_{j}}\right\rbrace \ldots \exp\left\lbrace-\frac{y_{D}^2}{2\lambda_{D}}\right\rbrace y_{i}y_{j} \mathrm{d} y_1 \ldots \mathrm{d} y_D.\end{aligned}\]

Here, we encounter 2 cases as follows:

1. $\int \exp\left\lbrace-\frac{y^2}{2\lambda}\right\rbrace \mathrm{d} y$

This part could be solved easier using the property of Gaussian distribution than literally integral it by partial functions. Since $\frac{1}{\left(2\pi\right)^{\frac{1}{2}}\lambda^{\frac{1}{2}}}\exp\left\lbrace-\frac{y^2}{2\lambda}\right\rbrace \sim \mathcal{N}(0, \lambda)$, and the area under any p.d.f is $1$, we can figure out that $\int \frac{1}{\left(2\pi\right)^{\frac{1}{2}}\lambda^{\frac{1}{2}}}\exp\left\lbrace-\frac{y^2}{2\lambda}\right\rbrace \mathrm{d} y = 1$, so $\int \exp\left\lbrace-\frac{y^2}{2\lambda}\right\rbrace \mathrm{d} y = \left(2\pi\right)^{\frac{1}{2}}\lambda^{\frac{1}{2}}$.

2. $\int \exp\left\lbrace-\frac{y^2}{2\lambda}\right\rbrace y \mathrm{d} y$

This part is relatively simple, since $\exp\left\lbrace-\frac{y^2}{2\lambda}\right\rbrace y$ is an odd function, $\int \exp\left\lbrace-\frac{y^2}{2\lambda}\right\rbrace y \mathrm{d} y = 0$.

Armed with the above 2 properties, it is easy to see that:

\[\begin{aligned} &\hspace{1.4em} \sum_{i=1}^{D} \sum_{j=1}^{D}\mathbf{u}_{i}\mathbf{u}_{j}^{\mathrm{T}} \int \ldots \int \exp\left\lbrace-\frac{y_{1}^2}{2\lambda_{1}}\right\rbrace \cdots \exp\left\lbrace-\frac{y_{i}^2}{2\lambda_{i}}\right\rbrace \ldots \exp\left\lbrace-\frac{y_{j}^2}{2\lambda_{j}}\right\rbrace \ldots \exp\left\lbrace-\frac{y_{D}^2}{2\lambda_{D}}\right\rbrace y_{i}y_{j} \mathrm{d} y_1 \ldots \mathrm{d} y_D \\& = \sum_{i=1}^{D} \sum_{j=1}^{D}\mathbf{u}_{i}\mathbf{u}_{j}^{\mathrm{T}} \int \exp\left\lbrace-\frac{y_{1}^2}{2\lambda_{1}}\right\rbrace \mathrm{d} y_1 \ldots \int \exp\left\lbrace-\frac{y_{i}^2}{2\lambda_{i}}\right\rbrace y_i \mathrm{d} y_i \ldots \int \exp\left\lbrace-\frac{y_{j}^2}{2\lambda_{j}}\right\rbrace y_j \mathrm{d} y_j \ldots \int \exp\left\lbrace-\frac{y_{D}^2}{2\lambda_{D}}\right\rbrace \mathrm{d} y_D\\& = \sum_{i=1}^{D}\mathbf{u}_{i}\mathbf{u}_{1}^{\mathrm{T}}\underbrace{\int \exp\left\lbrace-\frac{y_{1}^2}{2\lambda_{1}}\right\rbrace y_1 \mathrm{d} y_1}_0 \ldots \int \exp\left\lbrace-\frac{y_{i}^2}{2\lambda_{i}}\right\rbrace y_i \mathrm{d} y_i \ldots \underbrace{\int \exp\left\lbrace-\frac{y_{D}^2}{2\lambda_{D}}\right\rbrace \mathrm{d} y_D}_{\left(2\pi\right)^{\frac{1}{2}}\lambda_D^{\frac{1}{2}}} \\& + \sum_{i=1}^{D}\mathbf{u}_{i}\mathbf{u}_{2}^{\mathrm{T}}\int \exp\left\lbrace-\frac{y_{1}^2}{2\lambda_{1}}\right\rbrace \mathrm{d} y_1 \int \exp\left\lbrace-\frac{y_{2}^2}{2\lambda_{2}}\right\rbrace y_2 \mathrm{d} y_2\ldots \int \exp\left\lbrace-\frac{y_{i}^2}{2\lambda_{i}}\right\rbrace y_i \mathrm{d} y_i \ldots \int \exp\left\lbrace-\frac{y_{D}^2}{2\lambda_{D}}\right\rbrace \mathrm{d} y_D \\& + \ldots \\& + \sum_{i=1}^{D}\mathbf{u}_{i}\mathbf{u}_{i}^{\mathrm{T}}\int \exp\left\lbrace-\frac{y_{1}^2}{2\lambda_{1}}\right\rbrace \mathrm{d} y_1 \ldots \int \exp\left\lbrace-\frac{y_{i}^2}{2\lambda_{i}}\right\rbrace y_i^2 \mathrm{d} y_i \ldots \int \exp\left\lbrace-\frac{y_{D}^2}{2\lambda_{D}}\right\rbrace \mathrm{d} y_D \\& + \ldots \\& + \sum_{i=1}^{D}\mathbf{u}_{i}\mathbf{u}_{D}^{\mathrm{T}}\int \exp\left\lbrace-\frac{y_{1}^2}{2\lambda_{1}}\right\rbrace \mathrm{d} y_1 \ldots \int \exp\left\lbrace-\frac{y_{i}^2}{2\lambda_{i}}\right\rbrace y_i \mathrm{d} y_i \ldots \int \exp\left\lbrace-\frac{y_{D}^2}{2\lambda_{D}}\right\rbrace y_D \mathrm{d} y_D \\&= 0+0+\ldots+ \sum_{i=1}^{D}\mathbf{u}_{i}\mathbf{u}_{i}^{\mathrm{T}}\int \exp\left\lbrace-\frac{y_{i}^2}{2\lambda_{i}}\right\rbrace y_i^2 \mathrm{d} y_i\left(\left(2\pi\right)^{\frac{1}{2}}\right)^{D-1}\prod_{s=1,s\neq i}^D\lambda_s^{\frac{1}{2}}+\ldots+0 \\&= \left(\left(2\pi\right)^{\frac{1}{2}}\right)^{D-1}\prod_{s=1,s\neq i}^D\lambda_s^{\frac{1}{2}}\sum_{i=1}^{D}\mathbf{u}_{i}\mathbf{u}_{i}^{\mathrm{T}}\int \exp\left\lbrace-\frac{y_{i}^2}{2\lambda_{i}}\right\rbrace y_i^2 \mathrm{d} y_i.\end{aligned}\]

Now, we need only to solve the integral term, and it is readily to obtain using the variance formula of Gaussian distribution. Since $\frac{1}{\left(2\pi\right)^{\frac{1}{2}}\lambda^{\frac{1}{2}}}\exp\left\lbrace-\frac{y^2}{2\lambda}\right\rbrace \sim \mathcal{N}(0, \lambda)$, we could conclude that $Var(y)=\mathbb{E}[y^2]-\mathbb{E}^2[y]=\lambda$. Since $\mathbb{E}[y] = 0 = \mathrm{mean}$, $\mathbb{E}[y^2]=\lambda=\int \frac{1}{\left(2\pi\right)^{\frac{1}{2}}\lambda^{\frac{1}{2}}}\exp\left\lbrace-\frac{y^2}{2\lambda}\right\rbrace y^2 \mathrm{d} y$, so $\int \exp\left\lbrace-\frac{y^2}{2\lambda}\right\rbrace y^2 \mathrm{d} y = {\left(2\pi\right)^{\frac{1}{2}}\lambda^{\frac{1}{2}}} \lambda$. Insert this into the fianl formula:

\[\begin{aligned} &\hspace{1.4em} \left(\left(2\pi\right)^{\frac{1}{2}}\right)^{D-1}\prod_{s=1,s\neq i}^D\lambda_s^{\frac{1}{2}}\sum_{i=1}^{D}\mathbf{u}_{i}\mathbf{u}_{i}^{\mathrm{T}}\int \exp\left\lbrace-\frac{y_{i}^2}{2\lambda_{i}}\right\rbrace y_i^2 \mathrm{d} y_i \\&= \left(\left(2\pi\right)^{\frac{1}{2}}\right)^{D-1}\prod_{s=1,s\neq i}^D\lambda_s^{\frac{1}{2}}\sum_{i=1}^{D}\mathbf{u}_{i}\mathbf{u}_{i}^{\mathrm{T}}{\left(2\pi\right)^{\frac{1}{2}}\lambda_i^{\frac{1}{2}}} \lambda_i \\&= \left(\left(2\pi\right)^{\frac{1}{2}}\right)^{D}\prod_{s=1}^D\lambda_s^{\frac{1}{2}}\sum_{i=1}^{D}\mathbf{u}_{i}\mathbf{u}_{i}^{\mathrm{T}} \lambda_i.\end{aligned}\]

According to (2.55), $|\mathbf{\Sigma}|^{1 / 2}=\prod_{s=1}^{D} \lambda_{s}^{1 / 2}$, we could eventually derive out that

\[\begin{aligned} &\hspace{1.4em}\frac{1}{(2 \pi)^{D / 2}} \frac{1}{|\mathbf{\Sigma}|^{1 / 2}} \int \exp \left\lbrace-\frac{1}{2} \mathbf{z}^{\mathrm{T}} \mathbf{\Sigma}^{-1} \mathbf{z}\right\rbrace \mathbf{z} \mathbf{z}^{\mathrm{T}} \mathrm{d} \mathbf{z} \\& = \frac{1}{(2 \pi)^{D / 2}} \frac{1}{|\mathbf{\Sigma}|^{1 / 2}}\left(\left(2\pi\right)^{\frac{1}{2}}\right)^{D}\prod_{s=1}^D\lambda_s^{\frac{1}{2}}\sum_{i=1}^{D}\mathbf{u}_{i}\mathbf{u}_{i}^{\mathrm{T}} \lambda_i \\&= \frac{1}{(2 \pi)^{D / 2}} \frac{1}{|\mathbf{\Sigma}|^{1 / 2}}\left(\left(2\pi\right)^{\frac{1}{2}}\right)^{D}|\mathbf{\Sigma}|^{1 / 2}\sum_{i=1}^{D}\mathbf{u}_{i}\mathbf{u}_{i}^{\mathrm{T}} \lambda_i \\&=\sum_{i=1}^{D}\mathbf{u}_{i}\mathbf{u}_{i}^{\mathrm{T}} \lambda_i=\Sigma,\end{aligned}\]

which completes the proof.