Accelerating the pace of engineering and science

# Statistics Toolbox

## Multivariate Statistics

Multivariate statistics provide algorithms and functions to analyze multiple variables. Typical applications include:

• Transforming correlated data into a set of uncorrelated components using rotation and centering (principal component analysis)
• Exploring relationships between variables using visualization techniques, such as scatter plot matrices and classical multidimensional scaling
• Segmenting data with cluster analysis

Fitting an Orthogonal Regression Using Principal Component Analysis (Example)
Implement Deming regression (total least squares).

### Feature Transformation

Feature transformation techniques enable dimensionality reduction when transformed features can be more easily ordered than original features. Statistics Toolbox offers three classes of feature transformation algorithms:

• Principal component analysis for summarizing data in fewer dimensions
• Nonnegative matrix factorization when model terms must represent nonnegative quantities
• Factor analysis for building explanatory models of data correlation

Partial Least Squares Regression and Principal Component Regression (Example)
Model a response variable in the presence of highly correlated predictors.

### Multivariate Visualization

Statistics Toolbox provides graphs and charts to explore multivariate data visually, including:

• Scatter plot matrices
• Dendograms
• Biplots
• Parallel coordinate charts
• Andrews plots
• Glyph plots

Group scatter plot matrix showing how model year impacts different variables.

Andrews plot showing how country of original impacts the variables.

### Cluster Analysis

Statistics Toolbox offers multiple algorithms for cluster analysis, including:

• Hierarchical clustering, which creates an agglomerative cluster typically represented as a tree.
• K-means clustering, which assigns data points to the cluster with the closest mean.
• Gaussian mixtures, which are formed by combining multivariate normal density components. Clusters are assigned by selecting the component that maximizes posterior probability.

Two-component Gaussian mixture model fit to a mixture of bivariate Gaussians.

Output from applying a clustering algorithm to the same example.

Dendrogram plot showing a model with four clusters.