Correlation and Covariance

Background Information

The cross-correlation sequence for two wide-sense stationary random process, x(n) and y(n) is

$R_{x y} (m) = E {x (n + m) y^{*} (n)},$

where the asterisk denotes the complex conjugate and the expectation is over the ensemble of realizations that constitute the random processes.

Note that cross-correlation is not commutative, but a Hermitian (conjugate) symmetry property holds such that:

$R_{x y} (m) = R_{y x}^{*} (- m) .$

The cross-covariance between x(n) and y(n) is:

$C_{x y} (m) = E {(x (n + m) - μ_{x}) {(y (n) - μ_{y})}^{*}} = R_{x y} (m) - μ_{x} μ_{y}^{*} .$

For zero-mean wide-sense stationary random processes, the cross-correlation and cross-covariance are equivalent.

In practice, you must estimate these sequences, because it is possible to access only a finite segment of the infinite-length random processes. Further, it is often necessary to estimate ensemble moments based on time averages because only a single realization of the random processes are available. A common estimate based on N samples of x(n) and y(n) is the deterministic cross-correlation sequence (also called the time-ambiguity function)

${\hat{R}}_{x y} (m) = {\begin{array}{l} \sum_{n = 0}^{N - m - 1} x (n + m) y^{*} (n), & m \geq 0, \\ {\hat{R}}_{y x}^{*} (- m), & m < 0. \end{array}$

where we assume for this discussion that x(n) and y(n) are indexed from 0 to N – 1, and ${\hat{R}}_{x y} (m)$ from –(N – 1) to N – 1.

Using `xcorr` and `xcov` Functions

The functions xcorr and xcov estimate the cross-correlation and cross-covariance sequences of random processes. They also handle autocorrelation and autocovariance as special cases. The xcorr function evaluates the sum shown above with an efficient FFT-based algorithm, given inputs x(n) and y(n) stored in length N vectors x and y. Its operation is equivalent to convolution with one of the two subsequences reversed in time.

For example:

x = [1 1 1 1 1]';
y = x;
xyc = xcorr(x,y)

Notice that the resulting sequence length is one less than twice the length of the input sequence. Thus, the Nth element is the correlation at lag 0. Also notice the triangular pulse of the output that results when convolving two square pulses.

The xcov function estimates autocovariance and cross-covariance sequences. This function has the same options and evaluates the same sum as xcorr, but first removes the means of x and y.

Bias and Normalization

An estimate of a quantity is biased if its expected value is not equal to the quantity it estimates. The expected value of the output of xcorr is

$E {{\hat{R}}_{x y} (m)} = (N - | m |) R_{x y} (m) .$

xcorr provides the unbiased estimate, dividing by N – |m| when you specify an 'unbiased' flag after the input sequences.

xcorr(x,y,'unbiased')

Although this estimate is unbiased, the end points (near –(N – 1) and N – 1) suffer from large variance because xcorr computes them using only a few data points. A possible trade-off is to simply divide by N using the 'biased' flag:

xcorr(x,y,'biased')

With this scheme, only the sample of the correlation at zero lag (the Nth output element) is unbiased. This estimate is often more desirable than the unbiased one because it avoids random large variations at the end points of the correlation sequence.

xcorr provides one other normalization scheme. The syntax

xcorr(x,y,'coeff')

divides the output by norm(x)*norm(y) so that, for autocorrelations, the sample at zero lag is 1.

Multiple Channels

For a multichannel signal, xcorr and xcov estimate the autocorrelation and cross-correlation and covariance sequences for all of the channels at once. If S is an M-by-N signal matrix representing N channels in its columns, xcorr(S) returns a (2M – 1)-by-N² matrix with the autocorrelations and cross-correlations of the channels of S in its N² columns. If S is a three-channel signal

S = [s1 s2 s3]

then the result of xcorr(S) is organized as

R = [Rs1s1 Rs1s2 Rs1s3 Rs2s1 Rs2s2 Rs2s3 Rs3s1 Rs3s2 Rs3s3]

Two related functions, cov and corrcoef, are available in the standard MATLAB^® environment. They estimate covariance and normalized covariance respectively between the different channels at lag 0 and arrange them in a square matrix.