ClassificationTree.template will be removed
in a future release. Use templateTree instead.

Syntax

t = ClassificationTree.template t = ClassificationTree.template(Name,Value)

Description

t = ClassificationTree.template returns
a learner template suitable to use in the fitensemble function.

t = ClassificationTree.template(Name,Value) creates
a template with additional options specified by one or more Name,Value pair
arguments. You can specify several name-value pair arguments in any
order as Name1,Value1,…,NameN,ValueN.

Input Arguments

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments.
Name is the argument
name and Value is the corresponding
value. Name must appear
inside single quotes (' ').
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN.

'AlgorithmForCategorical'

Algorithm to find the best split on a categorical predictor
for data with K = 3 or more classes. The available
algorithms are:

'Exact'

For a categorical predictor with C categories,
consider all 2^{C — 1} —
1 combinations.

'PullLeft'

Start with all C categories on the right
branch. Consider moving each category to the left branch as it achieves
the minimum impurity for the K classes among the
remaining categories. Out of this sequence, choose the split that
has the lowest impurity.

'PCA'

Compute a score for each category using the inner product between
the first principal component of a weighted covariance matrix (of
the centered class probability matrix) and the vector of class probabilities
for that category. Sort the scores in ascending order, and consider
all C — 1 splits.

'OVAbyClass'

Start with all C categories on the right
branch. For each class, order the categories based on their probability
for that class. For the first class, consider moving each category
to the left branch in order, recording the impurity criterion at each
move. Repeat for the remaining classes. Out of this sequence, choose
the split that has the minimum impurity.

Default: ClassificationTree selects the optimal subset
of algorithms for each split using the known number of classes and
levels of a categorical predictor. For two classes, ClassificationTree always
performs the exact search.

'MaxCat'

ClassificationTree splits a categorical predictor
using the exact search algorithm if the predictor has at most MaxCat levels
in the split node. Otherwise, ClassificationTree finds
the best categorical split using one of the inexact algorithms.

Specify MaxCat as a numeric nonnegative scalar
value. Passing a small value can lead to long computation time and
memory overload.

Default: 10

'MergeLeaves'

String that specifies whether to merge leaves after the tree
is grown. Values are 'on' or 'off'.

When 'on', ClassificationTree merges
leaves that originate from the same parent node, and that give a sum
of risk values greater or equal to the risk associated with the parent
node. When 'off', ClassificationTree does
not merge leaves.

Default: 'off'

'MinLeaf'

Each leaf has at least MinLeaf observations
per tree leaf. If you supply both MinParent and MinLeaf, ClassificationTree uses
the setting that gives larger leaves: MinParent=max(MinParent,2*MinLeaf).

Default: Half the number of training observations for boosting, 1 for
bagging

'MinParent'

Each branch node in the tree has at least MinParent observations.
If you supply both MinParent and MinLeaf, ClassificationTree uses
the setting that gives larger leaves: MinParent=max(MinParent,2*MinLeaf).

Default: Number of training observations for boosting, 2 for
bagging

'NVarToSample'

Number of predictors to select at random for each split. Can
be a positive integer or 'all', which means use
all available predictors.

Default: 'all' for boosting, square root of number
of predictors for bagging

'Prune'

When 'on', ClassificationTree grows
the classification tree and computes the optimal sequence of pruned
subtrees. When 'off'ClassificationTree grows
the tree without pruning.

Default: 'off'

'PruneCriterion'

String with the pruning criterion, either 'error' or 'impurity'.

Default: 'error'

'SplitCriterion'

Criterion for choosing a split. One of 'gdi' (Gini's
diversity index), 'twoing' for the twoing rule,
or 'deviance' for maximum deviance reduction (also
known as cross entropy).

Default: 'gdi'

'Surrogate'

String describing whether to find surrogate decision splits
at each branch node. Specify as 'on', 'off', 'all',
or a positive scalar value.

When 'on', ClassificationTree finds
at most 10 surrogate splits at each branch node.

When set to a positive integer value, ClassificationTree finds
at most the specified number of surrogate splits at each branch node.

When set to 'all', ClassificationTree finds
all surrogate splits at each branch node. The 'all' setting
can use much time and memory.

Use surrogate splits to improve the accuracy of predictions
for data with missing values. The setting also enables you to compute
measures of predictive association between predictors.

Default: 'off'

Output Arguments

t

Classification tree template suitable to use in the fitensemble function. In an ensemble, t specifies
how to grow the classification trees.

Create a classification template with surrogate splits,
and train an ensemble for the Fisher iris model with the template.

t = ClassificationTree.template('surrogate','on');
load fisheriris
ens = fitensemble(meas,species,'AdaBoostM2',100,t);

References

[1] Coppersmith, D., S. J. Hong, and J. R.
M. Hosking. "Partitioning Nominal Attributes in Decision Trees." Data
Mining and Knowledge Discovery, Vol. 3, 1999, pp. 197–217.