Mean excluding outliers
m = trimmean(X,percent)
m = trimmean(X,percent,flag)
m = trimmean(x,percent,flag,dim)
m = trimmean(X,percent) calculates the trimmed mean of the values in X. For a vector input, m is the mean of X, excluding the highest and lowest k data values, where k=n*(percent/100)/2 and where n is the number of values in X. For a matrix input, m is a row vector containing the trimmed mean of each column of X. For n-D arrays, trimmean operates along the first non-singleton dimension. percent is a scalar between 0 and 100.
|'round'||Round k to the nearest integer (round to a smaller integer if k is a half integer). This is the default.|
|'floor'||Round k down to the next smaller integer.|
|'weight'||If k=i+f where i is the integer part and f is the fraction, compute a weighted mean with weight (1-f) for the (i+1)th and (n-i)th values, and full weight for the values between them.|
Generate a 100-by-100 matrix of random numbers from the standard normal distribution. This represents 100 samples, each containing 100 data points.
rng('default'); % For reproducibility x = normrnd(0,1,100,100);
Compute the sample mean and the 10% trimmed mean for each column of the data matrix.
m = mean(x); trim = trimmean(x,10);
Compute the efficiency of the 10% trimmed mean relative to the sample mean for the data.
sm = std(m); strim = std(trim); efficiency = (sm/strim).^2
efficiency = 0.9663
Generate random data from the t location-scale distribution, which tends to have outliers.
rng('default') % For reproducibility x = trnd(1,40,1);
Visualize the distribution using a normal probability plot.
Although the distribution is symmetric around zero, there are several outliers which will affect the mean. The trimmed mean is closer to zero, which is more representative of the data.
mean = mean(x) tmean = trimmean(x,25)
mean = 2.7991 tmean = 0.8797
The trimmed mean is a robust estimate of the location of a sample. If there are outliers in the data, the trimmed mean is a more representative estimate of the center of the body of the data than the mean. However, if the data is all from the same probability distribution, then the trimmed mean is less efficient than the sample mean as an estimator of the location of the data.