Outlier Detection and Removal [hampel]

Detect and replace outliers with appropriate local values in a non-linear time series.
6.1K Downloads
Updated 9 Feb 2012

View License

HAMPEL(X,Y,DX,T,varargin) returns the Hampel filtered values of the
elements in Y. It was developed to detect outliers in a time series,
but it can also be used as an alternative to the standard median
filter.

References
Chapters 1.4.2, 3.2.2 and 4.3.4 in Mining Imperfect Data: Dealing with
Contamination and Incomplete Records by Ronald K. Pearson.

Acknowledgements
I would like to thank Ronald K. Pearson for the introduction to moving
window filters. Please visit his blog at:
http://exploringdatablog.blogspot.com/2012/01/moving-window-filters-and-pracma.html

X,Y are row or column vectors with an equal number of elements.
The elements in Y should be Gaussian distributed.

Input DX,T,varargin must not contain NaN values!

DX,T are optional scalar values.
DX is a scalar which defines the half width of the filter window.
It is required that DX > 0 and DX should be dimensionally equivalent to
the values in X.
T is a scalar which defines the threshold value used in the equation
|Y - Y0| > T*S0.

Standard Parameters for DX and T:
DX = 3*median(X(2:end)-X(1:end-1));
T = 3;

varargin covers addtional optional input. The optional input must be in
the form of 'PropertyName', PropertyValue.
Supported PropertyNames:
'standard': Use the standard Hampel filter.
'adaptive': Use an experimental adaptive Hampel filter. Explained under
Revision 1 details below.

Supported PropertyValues: Scalar value which defines the tolerance of
the adaptive filter. In the case of standard Hampel filter this value
is ignored.

Output YY,I,Y0,LB,UB,ADX are column vectors containing Hampel filtered
values of Y, a logical index of the replaced values, nominal data,
lower and upper bounds on the Hampel filter and the relative half size
of the local window, respectively.

NO is a scalar that specifies the Number of Outliers detected.

Examples
1. Hampel filter removal of outliers
X = 1:1000; % Pseudo Time
Y = 5000 + randn(1000, 1); % Pseudo Data
Outliers = randi(1000, 10, 1); % Index of Outliers
Y(Outliers) = Y(Outliers) + randi(1000, 10, 1); % Pseudo Outliers
[YY,I,Y0,LB,UB] = hampel(X,Y);

plot(X, Y, 'b.'); hold on; % Original Data
plot(X, YY, 'r'); % Hampel Filtered Data
plot(X, Y0, 'b--'); % Nominal Data
plot(X, LB, 'r--'); % Lower Bounds on Hampel Filter
plot(X, UB, 'r--'); % Upper Bounds on Hampel Filter
plot(X(I), Y(I), 'ks'); % Identified Outliers

2. Adaptive Hampel filter removal of outliers
DX = 1; % Window Half size
T = 3; % Threshold
Threshold = 0.1; % AdaptiveThreshold
X = 1:DX:1000; % Pseudo Time
Y = 5000 + randn(1000, 1); % Pseudo Data
Outliers = randi(1000, 10, 1); % Index of Outliers
Y(Outliers) = Y(Outliers) + randi(1000, 10, 1); % Pseudo Outliers
[YY,I,Y0,LB,UB] = hampel(X,Y,DX,T,'Adaptive',Threshold);

plot(X, Y, 'b.'); hold on; % Original Data
plot(X, YY, 'r'); % Hampel Filtered Data
plot(X, Y0, 'b--'); % Nominal Data
plot(X, LB, 'r--'); % Lower Bounds on Hampel Filter
plot(X, UB, 'r--'); % Upper Bounds on Hampel Filter
plot(X(I), Y(I), 'ks'); % Identified Outliers

3. Median Filter Based on Filter Window
DX = 3; % Filter Half Size
T = 0; % Threshold
X = 1:1000; % Pseudo Time
Y = 5000 + randn(1000, 1); % Pseudo Data
[YY,I,Y0] = hampel(X,Y,DX,T);

plot(X, Y, 'b.'); hold on; % Original Data
plot(X, Y0, 'r'); % Median Filtered Data

Cite As

Michael Lindholm Nielsen (2024). Outlier Detection and Removal [hampel] (https://www.mathworks.com/matlabcentral/fileexchange/34795-outlier-detection-and-removal-hampel), MATLAB Central File Exchange. Retrieved .

MATLAB Release Compatibility
Created with R2010a
Compatible with any release
Platform Compatibility
Windows macOS Linux
Acknowledgements

Inspired: Hampel filter in C++

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
Version Published Release Notes
1.6.0.0

(1) Corrected potential error in internal median function.
(2) Removed internal "keyboard" command.
(3) Optimized internal Gauss filter.

1.5.0.0

(1) The elements in X and Y are now temporarily sorted for internal computations.
(2) Performance optimization.
(3) Added Example 3.

1.4.0.0

(1) If the number of elements (X,Y) are below 2 the output YY will be a copy of Y. No outliers will be detected. No error will be issued.

1.3.0.0

(1) Changed a calculation in the adaptive Hampel filter. See details in file.
(2) Checks if DX,T or varargin contains NaN values.
(3) Now capable of ignoring NaN values in X and Y.
(4) Added output Y0 - Nominal Data.

1.1.0.0

(1) Replaced output S with lower and upper bounds on the Hampel filter.
(2) Added option to use an experimental adaptive Hampel filter.
The Principle behind this filter is described in hampel.m.

1.0.0.0