Covariance matrix
The covariance matrix generalizes the concept of variance from one to n dimensions, or in other words from scalar-valued random variables to vector-valued random variables (tuples of scalar random variables). If X is a scalar-valued random variable with expected value μ then its variance is
If X is an n-by-1 column vector-valued random variable whose expected value is an n-by-1 column vector μ then its variance is the n-by-n nonnegative-definite matrix
Nomenclatures differ. Some statisticians, following the great probabilist William Feller, call this the variance of the random vector X because it is the natural generalization to higher dimensions of the 1-dimensional variance. Other call it the covariance matrix because it is the matrix of covariances between the scalar components of the vector X.
With scalar-valued random variables X we have the identity
if a is constant, i.e., not random. If X is an n-by-1 column vector-valued random variable, and A is an m-by-n constant (i.e., non-random) matrix, then AX is an m-by-1 column vector-valued random variable, whose variance must therefore by an m-by-m matrix. It is
This covariance matrix (though very simple) is a very useful tool in many very different areas. From it a transformation matrix can be derived that allows to completly decorrelate the data or from a different point of view to find an optimal basis for representing the data in a compact way. This is called PCA (Principal Component Analysis) in statistics and KL-Transform (Karhunen-Loève transform) in image processing.