Covariance matrix

The covariance matrix generalizes the concept of variance from one to n dimensions, or in other words from scalar-valued random variables to vector-valued random variables (tuples of scalar random variables). If X is a scalar-valued random variable with expected value μ then its variance is

\sigma ^{2}={\rm {var}}(X)=E((X-\mu )^{2}).

If X is an n-by-1 column vector-valued random variable whose expected value is an n-by-1 column vector μ then its variance is the n-by-n nonnegative-definite matrix

\Sigma ={\rm {var}}(X)=E((X-\mu )(X-\mu )^{\top }).

Nomenclatures differ. Some statisticians, following the great probabilist William Feller, call this the variance of the random vector X because it is the natural generalization to higher dimensions of the 1-dimensional variance. Other call it the covariance matrix because it is the matrix of covariances between the scalar components of the vector X.

With scalar-valued random variables X we have the identity

{\rm {var}}(aX)=a^{2}{\rm {var}}(X)

if a is constant, i.e., not random. If X is an n-by-1 column vector-valued random variable, and A is an m-by-n constant (i.e., non-random) matrix, then AX is an m-by-1 column vector-valued random variable, whose variance must therefore by an m-by-m matrix. It is

{\rm {var}}(AX)=A\Sigma A^{\top }.

This covariance matrix (though very simple) is a very useful tool in many very different areas. From it a transformation matrix can be derived that allows to completly decorrelate the data or from a different point of view to find an optimal basis for representing the data in a compact way. This is called PCA (Principal Component Analysis) in statistics and KL-Transform (Karhunen-Loève transform) in image processing.