Documentation Center 
Note: fit will be removed in a future release. Use fitgmdist instead. 
obj = gmdistribution.fit(X,k)
obj = gmdistribution.fit(...,param1,val1,param2,val2,...)
obj = gmdistribution.fit(X,k) uses an Expectation Maximization (EM) algorithm to construct an object obj of the gmdistribution class containing maximum likelihood estimates of the parameters in a Gaussian mixture model with k components for data in the nbyd matrix X, where n is the number of observations and d is the dimension of the data.
gmdistribution treats NaN values as missing data. Rows of X with NaN values are excluded from the fit.
obj = gmdistribution.fit(...,param1,val1,param2,val2,...) provides control over the iterative EM algorithm. Parameters and values are listed below.
Parameter  Value 

'Start'  Method used to choose initial component parameters. One of the following:

'Replicates'  A positive integer giving the number of times to repeat the EM algorithm, each time with a new set of parameters. The solution with the largest likelihood is returned. A value larger than 1 requires the 'randSample' start method. The default is 1. 
'CovType'  'diagonal' if the covariance matrices are restricted to be diagonal; 'full' otherwise. The default is 'full'. 
'SharedCov'  Logical true if all the covariance matrices are restricted to be the same (pooled estimate); logical false otherwise. 
'Regularize'  A nonnegative regularization number added to the diagonal of covariance matrices to make them positivedefinite. The default is 0. 
'Options'  Options structure for the iterative EM algorithm, as created by statset. gmdistribution.fit uses the parameters 'Display' with a default value of 'off', 'MaxIter' with a default value of 100, and 'TolFun' with a default value of 1e6. 
In some cases, gmdistribution may converge to a solution where one or more of the components has an illconditioned or singular covariance matrix.
The following issues may result in an illconditioned covariance matrix:
The number of dimension of your data is relatively high and there are not enough observations.
Some of the features (variables) of your data are highly correlated.
Some or all the features are discrete.
You tried to fit the data to too many components.
In general, you can avoid getting illconditioned covariance matrices by using one of the following precautions:
Preprocess your data to remove correlated features.
Set 'SharedCov' to true to use an equal covariance matrix for every component.
Set 'CovType' to 'diagonal'.
Use 'Regularize' to add a very small positive number to the diagonal of every covariance matrix.
Try another set of initial values.
In other cases gmdistribution may pass through an intermediate step where one or more of the components has an illconditioned covariance matrix. Trying another set of initial values may avoid this issue without altering your data or model.
Generate data from a mixture of two bivariate Gaussian distributions using the mvnrnd function:
MU1 = [1 2]; SIGMA1 = [2 0; 0 .5]; MU2 = [3 5]; SIGMA2 = [1 0; 0 1]; X = [mvnrnd(MU1,SIGMA1,1000);mvnrnd(MU2,SIGMA2,1000)]; scatter(X(:,1),X(:,2),10,'.') hold on
Next, fit a twocomponent Gaussian mixture model:
options = statset('Display','final'); obj = gmdistribution.fit(X,2,'Options',options); 10 iterations, loglikelihood = 7046.78 h = ezcontour(@(x,y)pdf(obj,[x y]),[8 6],[8 6]);
Among the properties of the fit are the parameter estimates:
ComponentMeans = obj.mu ComponentMeans = 0.9391 2.0322 2.9823 4.9737 ComponentCovariances = obj.Sigma ComponentCovariances(:,:,1) = 1.7786 0.0528 0.0528 0.5312 ComponentCovariances(:,:,2) = 1.0491 0.0150 0.0150 0.9816 MixtureProportions = obj.PComponents MixtureProportions = 0.5000 0.5000
The Akaike information is minimized by the twocomponent model:
AIC = zeros(1,4); obj = cell(1,4); for k = 1:4 obj{k} = gmdistribution.fit(X,k); AIC(k)= obj{k}.AIC; end [minAIC,numComponents] = min(AIC); numComponents numComponents = 2 model = obj{2} model = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.500000 Mean: 0.9391 2.0322 Component 2: Mixing proportion: 0.500000 Mean: 2.9823 4.9737
Both the Akaike and Bayes information are negative loglikelihoods for the data with penalty terms for the number of estimated parameters. They are often used to determine an appropriate number of components for a model when the number of components is unspecified.
[1] McLachlan, G., and D. Peel. Finite Mixture Models. Hoboken, NJ: John Wiley & Sons, Inc., 2000.