Next: Probability Distributions
Multivariate statistics provide algorithms and functions to analyze multiple variables. Typical applications include:
- Transforming correlated data into a set of uncorrelated components using rotation and centering (principal component analysis)
- Exploring relationships between variables using visualization techniques, such as scatter plot matrices and classical multidimensional scaling
- Segmenting data with cluster analysis
Feature transformation techniques enable dimensionality reduction when transformed features can be more easily ordered than original features. Statistics Toolbox offers three classes of feature transformation algorithms:
- Principal component analysis for summarizing data in fewer dimensions
- Nonnegative matrix factorization when model terms must represent nonnegative quantities
- Factor analysis for building explanatory models of data correlation
Statistics Toolbox provides graphs and charts to explore multivariate data visually, including:
- Scatter plot matrices
- Parallel coordinate charts
- Andrews plots
- Glyph plots
Group scatter plot matrix showing how model year impacts different variables.
Biplot showing the first three loadings from a principal component analysis.
Andrews plot showing how country of original impacts the variables.
Statistics Toolbox offers multiple algorithms for cluster analysis, including:
- Hierarchical clustering, which creates an agglomerative cluster typically represented as a tree.
- K-means clustering, which assigns data points to the cluster with the closest mean.
- Gaussian mixtures, which are formed by combining multivariate normal density components. Clusters are assigned by selecting the component that maximizes posterior probability.
Two-component Gaussian mixture model fit to a mixture of bivariate Gaussians.
Output from applying a clustering algorithm to the same example.
Dendrogram plot showing a model with four clusters.