quantregForest {quantregForest} | R Documentation |
Quantile Regression Forests
Description
Quantile Regression Forests infer conditional quantile functions from data
Usage
quantregForest(x,y, nthreads=1, keep.inbag=FALSE, ...)
Arguments
x |
A matrix or data.frame containing the predictor variables. |
y |
The response variable. |
nthreads |
The number of threads to use (for parallel computation). |
keep.inbag |
Keep information which observations are in and out-of-bag? For out-of-bag predictions, this argument needs to be set to |
... |
Other arguments passed to |
Details
The object can be converted back into a standard randomForest
object and all the functions of the randomForest
package can then be used (see example below).
The response y
should in general be numeric. However, some use cases exists if y
is a factor (such as sampling from conditional distribution when using for example what=function(x) sample(x,10)
). Trying to generate quantiles will generate an error if y
is a factor, though.
Parallel computation is invoked by setting the value of nthreads
to values larger than 1 (for example to the number of available CPUs).
The argument only has an effect under Linux and Mac OSX and is without effect on Windows due to restrictions on forking.
Value
A value of class quantregForest
, for which print
and predict
methods are available.
Class quantregForest
is a list of the following components additional to the ones given by class randomForest
:
call |
the original call to |
valuesNodes |
a matrix that contains per tree and node one subsampled observation |
Author(s)
Nicolai Meinshausen, Christina Heinze
References
N. Meinshausen (2006) "Quantile Regression Forests", Journal of Machine Learning Research 7, 983-999 http://jmlr.csail.mit.edu/papers/v7/
See Also
Examples
################################################
## Load air-quality data (and preprocessing) ##
################################################
data(airquality)
set.seed(1)
## remove observations with mising values
airquality <- airquality[ !apply(is.na(airquality), 1,any), ]
## number of remining samples
n <- nrow(airquality)
## divide into training and test data
indextrain <- sample(1:n,round(0.6*n),replace=FALSE)
Xtrain <- airquality[ indextrain,2:6]
Xtest <- airquality[-indextrain,2:6]
Ytrain <- airquality[ indextrain,1]
Ytest <- airquality[-indextrain,1]
################################################
## compute Quantile Regression Forests ##
################################################
qrf <- quantregForest(x=Xtrain, y=Ytrain)
qrf <- quantregForest(x=Xtrain, y=Ytrain, nodesize=10,sampsize=30)
## for parallel computation use the nthread option
## qrf <- quantregForest(x=Xtrain, y=Ytrain, nthread=8)
## predict 0.1, 0.5 and 0.9 quantiles for test data
conditionalQuantiles <- predict(qrf, Xtest)
print(conditionalQuantiles[1:4,])
## predict 0.1, 0.2,..., 0.9 quantiles for test data
conditionalQuantiles <- predict(qrf, Xtest, what=0.1*(1:9))
print(conditionalQuantiles[1:4,])
## estimate conditional standard deviation
conditionalSd <- predict(qrf, Xtest, what=sd)
print(conditionalSd[1:4])
## estimate conditional mean (as in original RF)
conditionalMean <- predict(qrf, Xtest, what=mean)
print(conditionalMean[1:4])
## sample 10 new observations from conditional distribution at each new sample
newSamples <- predict(qrf, Xtest,what = function(x) sample(x,10,replace=TRUE))
print(newSamples[1:4,])
## get ecdf-function for each new test data point
## (output will be a list with one element per sample)
condEcdf <- predict(qrf, Xtest, what=ecdf)
condEcdf[[10]](30) ## get the conditional distribution at value 30 for i=10
## or, directly, for all samples at value 30 (returns a vector)
condEcdf30 <- predict(qrf, Xtest, what=function(x) ecdf(x)(30))
print(condEcdf30[1:4])
## to use other functions of the package randomForest, convert class back
class(qrf) <- "randomForest"
importance(qrf) ## importance measure from the standard RF
#####################################
## out-of-bag predictions and sampling
##################################
## for with option keep.inbag=TRUE
qrf <- quantregForest(x=Xtrain, y=Ytrain, keep.inbag=TRUE)
## or use parallel version
## qrf <- quantregForest(x=Xtrain, y=Ytrain, nthread=8)
## get quantiles
oobQuantiles <- predict( qrf, what= c(0.2,0.5,0.8))
## sample from oob-distribution
oobSample <- predict( qrf, what= function(x) sample(x,1))