treeClust.control {treeClust} | R Documentation |
Parameters describing the output from a treeClust fit
Description
This function produces a list that is used as input to treeClust
to determine which items are preserved in the output.
Usage
treeClust.control(return.trees = FALSE, return.mat = TRUE,
return.dists = FALSE, return.newdata = FALSE, cluster.only = FALSE,
serule = 0, DevRatThreshold = 1, parallelnodes = 1, ...)
Arguments
return.trees |
If TRUE, all the trees that go into the object are returned. This can make the treeClust object very large. Default FALSE. |
return.mat |
If TRUE, return a matrix describing leaf membership. Default TRUE. |
return.dists |
If TRUE, return an object of class 'dissimilarity' giving all pairwise distances between observations. This can be very large for large datasets. Default FALSE. |
cluster.only |
If TRUE, return only the clustering vector, which names the cluster into which each observation is places. Default FALSE. |
return.newdata |
If TRUE, return a numeric matrix describing leaf membership and/or inter-point distance (see "Details"). Default FALSE. |
serule |
Describes how to prune the rpart trees. By default, each tree is pruned to the minimum error size. With serule > 0, each tree is pruned to the smallest size for which the cross-validated error is less than (min error) + (serule * sds). |
DevRatThreshold |
Trees whose deviance ratio is greater than this number are presumed to have arisen from redundant variables. The predictor at the tree's root is dropped, a new tree built, and the new deviance ratio computed. this process is repeated until the resulting tree has deviance ratio less than or equal to the threshold. Default: 1 (do not drop any such trees). |
parallelnodes |
Describes whether to use parallel processing by creating a "computing cluster" containing "parallelnodes" nodes. If that number is = 1 no cluster is created. Here "cluster" is referring to a set of nodes operating in parallel, not to the clustering of the data. |
... |
Other arguments, passed onto the output. |
Details
The "newdata" item is a numeric matrix that gives inter-point distances whose form depends on the "d.num" argument to treeClust(). When d.num = 1, each tree contributes a set of 0-1 dummy variables that serve as leaf membership indicators, and with d.num = 2, each tree's indicators are multiplied by that tree's "strength." With d.num = 3, a tree with k leaves contributes k-choose-2 columns, with the distances between distinct rows matching the d3 distances, and likewise with d.num = 4, a tree with k leaves produced k-choose-2 columns that have been weighted by tree strength.
Value
list, with all the input arguments and their supplied or default values.
Author(s)
Sam Buttrey, buttrey@nps.edu