rfArb {Rborist} | R Documentation |
Rapid Decision Tree Construction and Evaluation
Description
Accelerated implementation of the Random Forest (trademarked name) algorithm. Tuned for multicore and GPU hardware. Bindable with most numerical front-end languages in addtion to R. Invocation is similar to that provided by randomForest package.
Usage
## Default S3 method:
rfArb(x,
y,
autoCompress = 0.25,
ctgCensus = "votes",
classWeight = NULL,
impPermute = 0,
indexing = FALSE,
maxLeaf = 0,
minInfo = 0.01,
minNode = if (is.factor(y)) 2 else 3,
nHoldout = 0,
nLevel = 0,
nSamp = 0,
nThread = 0,
nTree = 500,
noValidate = FALSE,
predFixed = 0,
predProb = 0.0,
predWeight = NULL,
quantVec = NULL,
quantiles = !is.null(quantVec),
regMono = NULL,
rowWeight = numeric(0),
samplingWeight = numeric(0),
splitQuant = NULL,
streamline = FALSE,
thinLeaves = streamline || (is.factor(y) && !indexing),
trapUnobserved = FALSE,
treeBlock = 1,
verbose = FALSE,
withRepl = TRUE,
...)
Arguments
x |
the design matrix expressed as a |
y |
the response (outcome) vector, either numerical or
categorical. Row count must conform with |
autoCompress |
plurality above which to compress predictor values. |
ctgCensus |
report categorical validation by vote or by probability. |
classWeight |
proportional weighting of classification categories. |
impPermute |
number of importance permutations: 0 or 1. |
indexing |
whether to report final index, typically terminal, of tree traversal. |
maxLeaf |
maximum number of leaves in a tree. Zero denotes no limit. |
minInfo |
information ratio with parent below which node does not split. |
minNode |
minimum number of distinct row references to split a node. |
nHoldout |
number of observations to omit from sampling. Augmented by missing response values. |
nLevel |
maximum number of tree levels to train, including terminals (leaves). Zero denotes no limit. |
nSamp |
number of rows to sample, per tree. |
nThread |
suggests an OpenMP-style thread count. Zero denotes the default processor setting. |
nTree |
the number of trees to train. |
noValidate |
whether to train without validation. |
predFixed |
number of trial predictors for a split ( |
predProb |
probability of selecting individual predictor as trial splitter. |
predWeight |
relative weighting of individual predictors as trial splitters. |
quantVec |
quantile levels to validate. |
quantiles |
whether to report quantiles at validation. |
regMono |
signed probability constraint for monotonic regression. |
rowWeight |
row weighting for initial sampling of tree. Deprecated |
samplingWeight |
row weighting for initial sampling of tree. |
splitQuant |
(sub)quantile at which to place cut point for numerical splits |
.
streamline |
whether to streamline sampler contents to save space. |
thinLeaves |
bypasses creation of leaf state in order to reduce memory footprint. |
trapUnobserved |
reports score for nonterminal upon encountering values not observed during training, such as missing data. |
treeBlock |
maximum number of trees to train during a single level (e.g., coprocessor computing). |
verbose |
indicates whether to output progress of training. |
withRepl |
whether row sampling is by replacement. |
... |
not currently used. |
Value
an object sharing classes arbTrain
, documented with the
command rfTrain
, and rfArb
, a supplementary collection
consisting of the following items:
-
sampler
an object of classSampler
, as described in the documentation for thepresample
command, that summarizes the bagging structure. -
training
a list summarizing the training task, consisting of the following fields:-
call
the calling invocation. -
info
a vector of forest-wide Gini (classification) or weighted variance (regression), by predictor. -
version
the version of theRborist
package used to train. -
diag
diagnostics accumulated over the training task. -
samplerHash
hash value of theSampler
object used to train. Recorded for consistency of subsequent commands.
-
-
prediction
an object of classPredictReg
orPredictCtg
, as described by the documention for commandpredict
. -
validation
an object of classValidReg
orValidCtg
, as described by the documention for commandvalidate
, if validation is requested. -
importance
an object of classImportanceReg
orImportanceCtg
, as described by the documention for commandpredict
, if permutation performance has been requested.
Author(s)
Mark Seligman at Suiji.
References
Breiman, L. (2001) Random Forests, Machine Learning 45(1), 5-32.
See Also
Examples
## Not run:
# Regression example:
nRow <- 5000
x <- data.frame(replicate(6, rnorm(nRow)))
y <- with(x, X1^2 + sin(X2) + X3 * X4) # courtesy of S. Welling.
# Classification example:
data(iris)
# Generic invocation:
rb <- rfArb(x, y)
# Causes 300 trees to be trained:
rb <- rfArb(x, y, nTree = 300)
# Causes rows to be sampled without replacement:
rb <- rfArb(x, y, withRepl=FALSE)
# Causes validation census to report class probabilities:
rb <- rfArb(iris[-5], iris[5], ctgCensus="prob")
# Applies table-weighting to classification categories:
rb <- rfArb(iris[-5], iris[5], classWeight = "balance")
# Weights first category twice as heavily as remaining two:
rb <- rfArb(iris[-5], iris[5], classWeight = c(2.0, 1.0, 1.0))
# Does not split nodes when doing so yields less than a 2% gain in
# information over the parent node:
rb <- rfArb(x, y, minInfo=0.02)
# Does not split nodes representing fewer than 10 unique samples:
rb <- rfArb(x, y, minNode=10)
# Trains a maximum of 20 levels:
rb <- rfArb(x, y, nLevel = 20)
# Trains, but does not perform subsequent validation:
rb <- rfArb(x, y, noValidate=TRUE)
# Chooses 500 rows (with replacement) to root each tree.
rb <- rfArb(x, y, nSamp=500)
# Chooses 2 predictors as splitting candidates at each node (or
# fewer, when choices exhausted):
rb <- rfArb(x, y, predFixed = 2)
# Causes each predictor to be selected as a splitting candidate with
# distribution Bernoulli(0.3):
rb <- rfArb(x, y, predProb = 0.3)
# Causes first three predictors to be selected as splitting candidates
# twice as often as the other two:
rb <- rfArb(x, y, predWeight=c(2.0, 2.0, 2.0, 1.0, 1.0))
# Causes (default) quantiles to be computed at validation:
rb <- rfArb(x, y, quantiles=TRUE)
qPred <- rb$validation$qPred
# Causes specfied quantiles (deciles) to be computed at validation:
rb <- rfArb(x, y, quantVec = seq(0.1, 1.0, by = 0.10))
qPred <- rb$validation$qPred
# Constrains modelled response to be increasing with respect to X1
# and decreasing with respect to X5.
rb <- rfArb(x, y, regMono=c(1.0, 0, 0, 0, -1.0, 0))
# Causes rows to be sampled with random weighting:
rb <- rfArb(x, y, samplingWeight=runif(nRow))
# Suppresses creation of detailed leaf information needed for
# quantile prediction and external tools.
rb <- rfArb(x, y, thinLeaves = TRUE)
# Directs prediction to take a random branch on encountering
# values not observed during training, such as NA or an
# unrecognized category.
predict(rb, trapUnobserved = FALSE)
# Directs prediction to silently trap unobserved values, reporting a
# score associated with the current nonterminal tree node.
predict(rb, trapUnobserved = TRUE)
# Sets splitting position for predictor 0 to far left and predictor
# 1 to far right, others to default (median) position.
spq <- rep(0.5, ncol(x))
spq[0] <- 0.0
spq[1] <- 1.0
rb <- rfArb(x, y, splitQuant = spq)
## End(Not run)