rfTrain {Rborist} | R Documentation |
Rapid Decision Tree Training
Description
Accelerated training using the Random Forest (trademarked name) algorithm. Tuned for multicore and GPU hardware. Bindable with most numerical front-end languages in addtion to R.
Usage
## Default S3 method:
rfTrain(preFormat,
sampler,
y,
autoCompress = 0.25,
ctgCensus = "votes",
classWeight = NULL,
maxLeaf = 0,
minInfo = 0.01,
minNode = if (is.factor(y)) 2 else 3,
nLevel = 0,
nThread = 0,
predFixed = 0,
predProb = 0.0,
predWeight = NULL,
regMono = NULL,
splitQuant = NULL,
thinLeaves = FALSE,
treeBlock = 1,
verbose = FALSE,
...)
Arguments
y |
the response (outcome) vector, either numerical or categorical. |
preFormat |
Compressed, presorted representation of the predictor
values. Row count must conform with |
sampler |
Compressed representation of the sampled response. |
autoCompress |
plurality above which to compress predictor values. |
ctgCensus |
report categorical validation by vote or by probability. |
classWeight |
proportional weighting of classification categories. |
maxLeaf |
maximum number of leaves in a tree. Zero denotes no limit. |
minInfo |
information ratio with parent below which node does not split. |
minNode |
minimum number of distinct row references to split a node. |
nLevel |
maximum number of tree levels to train, including terminals (leaves). Zero denotes no limit. |
nThread |
suggests an |
predFixed |
number of trial predictors for a split ( |
predProb |
probability of selecting individual predictor as trial splitter. |
predWeight |
relative weighting of individual predictors as trial splitters. |
regMono |
signed probability constraint for monotonic regression. |
splitQuant |
(sub)quantile at which to place cut point for numerical splits |
.
thinLeaves |
bypasses creation of leaf state in order to reduce memory footprint. |
treeBlock |
maximum number of trees to train during a single level (e.g., coprocessor computing). |
verbose |
indicates whether to output progress of training. |
... |
Not currently used. |
Value
an object of class arbTrain
, containing:
-
version
the version of theRborist
package used to train. -
samplerHash
hash value of theSampler
object used to train. Recorded for consistency of subsequent commands. -
predInfo
a vector of forest-wide Gini (classification) or weighted variance (regression), by predictor. -
predMap
a vector of integers mapping internal to front-end predictor indices. -
forest
an object of classForest
containing:-
nTree
the number of trees trained. -
node
an object of classNode
consisting of:-
treeNode
forest-wide vector of packed node representations. -
extent
per-tree node counts. -
scores
numeric vector of scores, for all terminals and nonterminals. -
factor
an object of classFactor
consisting of:-
facSplit
forest-wide vector of packed factor bits. -
extent
per-tree extent of factor bits. -
observed
forest-wide vector of observed factor bits.
-
-
-
Leaf
an object of classLeaf
containing:-
extent
forest-wide vector of leaf populations, i.e., counts of unique samples. -
index
forest-wide vector of sample indices.
-
-
-
diag
diagnostics accumulated over the training task.
Author(s)
Mark Seligman at Suiji.
See Also
Examples
## Not run:
# Regression example:
nRow <- 5000
x <- data.frame(replicate(6, rnorm(nRow)))
y <- with(x, X1^2 + sin(X2) + X3 * X4) # courtesy of S. Welling.
# Classification example:
data(iris)
# Generic invocation:
rt <- rfTrain(y)
# Causes 300 trees to be trained:
rt <- rfTrain(y, nTree = 300)
# Causes validation census to report class probabilities:
rt <- rfTrain(iris[-5], iris[5], ctgCensus="prob")
# Applies table-weighting to classification categories:
rt <- rfTrain(iris[-5], iris[5], classWeight = "balance")
# Weights first category twice as heavily as remaining two:
rt <- rfTrain(iris[-5], iris[5], classWeight = c(2.0, 1.0, 1.0))
# Does not split nodes when doing so yields less than a 2% gain in
# information over the parent node:
rt <- rfTrain(y, preFormat, sampler, minInfo=0.02)
# Does not split nodes representing fewer than 10 unique samples:
rt <- rfTrain(y, preFormat, sampler, minNode=10)
# Trains a maximum of 20 levels:
rt <- rfTrain(y, preFormat, sampler, nLevel = 20)
# Trains, but does not perform subsequent validation:
rt <- rfTrain(y, preFormat, sampler, noValidate=TRUE)
# Chooses 500 rows (with replacement) to root each tree.
rt <- rfTrain(y, preFormat, sampler, nSamp=500)
# Chooses 2 predictors as splitting candidates at each node (or
# fewer, when choices exhausted):
rt <- rfTrain(y, preFormat, sampler, predFixed = 2)
# Causes each predictor to be selected as a splitting candidate with
# distribution Bernoulli(0.3):
rt <- rfTrain(y, preFormat, sampler, predProb = 0.3)
# Causes first three predictors to be selected as splitting candidates
# twice as often as the other two:
rt <- rfTrain(y, preFormat, sampler, predWeight=c(2.0, 2.0, 2.0, 1.0, 1.0))
# Constrains modelled response to be increasing with respect to X1
# and decreasing with respect to X5.
rt <- rfTrain(x, y, preFormat, sampler, regMono=c(1.0, 0, 0, 0, -1.0, 0))
# Suppresses creation of detailed leaf information needed for
# quantile prediction and external tools.
rt <- rfTrain(y, preFormat, sampler, thinLeaves = TRUE)
spq <- rep(0.5, ncol(x))
spq[0] <- 0.0
spq[1] <- 1.0
rt <- rfTrain(y, preFormat, sampler, splitQuant = spq)
## End(Not run)