R: Parameters describing the output from a treeClust fit

treeClust.control {treeClust}

R Documentation

Parameters describing the output from a treeClust fit

Description

This function produces a list that is used as input to treeClust to determine which items are preserved in the output.

Usage

treeClust.control(return.trees = FALSE, return.mat = TRUE, 
 return.dists = FALSE, return.newdata = FALSE, cluster.only = FALSE, 
 serule = 0, DevRatThreshold = 1, parallelnodes = 1, ...)

Arguments

`return.trees`	If TRUE, all the trees that go into the object are returned. This can make the treeClust object very large. Default FALSE.
`return.mat`	If TRUE, return a matrix describing leaf membership. Default TRUE.
`return.dists`	If TRUE, return an object of class 'dissimilarity' giving all pairwise distances between observations. This can be very large for large datasets. Default FALSE.
`cluster.only`	If TRUE, return only the clustering vector, which names the cluster into which each observation is places. Default FALSE.
`return.newdata`	If TRUE, return a numeric matrix describing leaf membership and/or inter-point distance (see "Details"). Default FALSE.
`serule`	Describes how to prune the rpart trees. By default, each tree is pruned to the minimum error size. With serule > 0, each tree is pruned to the smallest size for which the cross-validated error is less than (min error) + (serule * sds).
`DevRatThreshold`	Trees whose deviance ratio is greater than this number are presumed to have arisen from redundant variables. The predictor at the tree's root is dropped, a new tree built, and the new deviance ratio computed. this process is repeated until the resulting tree has deviance ratio less than or equal to the threshold. Default: 1 (do not drop any such trees).
`parallelnodes`	Describes whether to use parallel processing by creating a "computing cluster" containing "parallelnodes" nodes. If that number is = 1 no cluster is created. Here "cluster" is referring to a set of nodes operating in parallel, not to the clustering of the data.
`...`	Other arguments, passed onto the output.

Details

The "newdata" item is a numeric matrix that gives inter-point distances whose form depends on the "d.num" argument to treeClust(). When d.num = 1, each tree contributes a set of 0-1 dummy variables that serve as leaf membership indicators, and with d.num = 2, each tree's indicators are multiplied by that tree's "strength." With d.num = 3, a tree with k leaves contributes k-choose-2 columns, with the distances between distinct rows matching the d3 distances, and likewise with d.num = 4, a tree with k leaves produced k-choose-2 columns that have been weighted by tree strength.

Value

list, with all the input arguments and their supplied or default values.

Author(s)

Sam Buttrey, buttrey@nps.edu