R: predict method for arbTrain result

predict.arbTrain {Rborist}

R Documentation

predict method for arbTrain result

Description

Prediction and test using Rborist.

Usage

## S3 method for class 'arbTrain'
predict(object, newdata, sampler, yTest=NULL,
keyedFrame = FALSE, quantVec=NULL, quantiles = !is.null(quantVec),
ctgCensus = "votes", indexing = FALSE, trapUnobserved = FALSE,
bagging = FALSE, nThread = 0, verbose = FALSE, ...)

Arguments

`object`	an object of class `arbTrain`, created from a previous invocation of the command `rfArb`, `Rborist` or `rfTrain` to train.
`newdata`	a design frame or matrix containing new data, with the same signature of predictors as in the training command.
`sampler`	an object of class `Sampler` used in the command.
`yTest`	a response vector against which to test the new predictions.
`keyedFrame`	whether the columns of `newdata` may appear in arbitrary order or as a superset of the predictors used to train.
`quantVec`	a vector of quantiles to predict.
`quantiles`	whether to predict quantiles.
`ctgCensus`	whether/how to summarize per-category predictions. "votes" specifies the number of trees predicting a given class. "prob" specifies a normalized, probabilistic summary. "probSample" specifies sample-weighted probabilities, similar to quantile histogramming.
`indexing`	whether to record the final node index, typically terminal, of tree traversal.
`trapUnobserved`	reports score for nonterminal upon encountering values not observed during training, such as missing data.
`bagging`	whether prediction is restricted to out-of-bag samples.
`nThread`	suggests ans OpenMP-style thread count. Zero denotes default processor setting.
`verbose`	whether to output progress of prediction.
`...`	not currently used.

Value

an object of one of two classes:

SummaryReg summarizing regression, consisting of:
- prediction an object of class PredictReg consisting of:
  - yPred the estimated numerical response.
  - qPred quantiles of prediction, if requested.
  - qEst quantile of the estimate, if quantiles requested.
  - indices final index of prediction, if requested.
- validation if validation requested, an object of class ValidReg consisting of:
  - mse the mean-squared error of the estimate.
  - rsq the r-squared statistic of the estimate.
  - mae the mean absolute error of the estimate.
- importance if permution importance requested, an object of class importanceReg, containing multiple instances of:
  - names the predictor names.
  - mse the per-predictor mean-squared error, under permutation.
SummaryCtg summarizing classification, consisting of:
- PredictCtg consisting of:
  - yPred estimated categorical response.
  - census factor-valued matrix of the estimate, by category, if requested.
  - prob matrix of estimate probabilities, by category, if requested.
  - indices final index of prediction, if requested.
- validation if validation requested, an object of class ValidCtg consisting of:
  - confusion the confusion matrix.
  - misprediction the misprediction rate.
  - oobError the out-of-bag error.
- importance if permution importance requested, an object of class importanceCtg, consisting of:
  - mispred the misprediction rate, by predictor.
  - oobErr the out-of-bag error, by predictor.

Author(s)

Mark Seligman at Suiji.

Examples

## Not run: 
  # Regression example:
  nRow <- 5000
  x <- data.frame(replicate(6, rnorm(nRow)))
  y <- with(x, X1^2 + sin(X2) + X3 * X4) # courtesy of S. Welling.

  pf <- preformat(x)
  sp <- presample(y)
  rb <- arbTrain(pf, sp, y)


  # Performs separate prediction on new data:
  xx <- data.frame(replace(6, rnorm(nRow)))
  pred <- predict(rb, xx)
  yPred <- pred$yPred

  rb <- Rborist(x,y)

  # Performs separate prediction on new data:
  xx <- data.frame(replacate(6, rnorm(nRow)))
  pred <- predict(rb, xx)
  yPred <- pred$yPred

  # As above, but also records final indices of each tree walk:
  #
  pred <- predict(rb, xx, indexing=TRUE)
  print(pred$indices[c(1:2), ])


  # As above, but predicts over \code{newdata} with unobserved values.
  # In the case of numerical data, only missing values are considered
  # unobserved.  Missing values are encoded as \code{NaN}, which are
  # incomparable, precipitating \code{false} on every test.  Prediction
  # therefore takes the \code{false} branch when encountering missing
  # values:
  #
  xxMissing <- xx
  xxMissing[6, c(15, 32, 87, 101)] <- NA
  pred <- predict(rb, xxMissing)
  

  # As above, but returns a nonterminal score upon encountering
  # unobserved values. Neither the true nor the false branch from the
  # testing node is taken.  Instead, the score returned is derived
  # from all leaf nodes (terminals) reached by the testing
  # (nonterminal) node.
  #
  pred <- predict(rb, xxMissing, trapUnobserved = TRUE)


  # Performs separate prediction, using original response as test
  # vector:
  pred <- predict(rb, xx, y)
  mse <- pred$mse
  rsq <- pred$rsq


  # Performs separate prediction with (default) quantiles:
  pred <- predict(rb, xx, quantiles="TRUE")
  qPred <- pred$qPred


  # Performs separate prediction with deciles:
  pred <- predict(rb, xx, quantVec = seq(0.1, 1.0, by = 0.10))
  qPred <- pred$qPred


  # Classification examples:
  data(iris)
  rb <- Rborist(iris[-5], iris[5])


  # Generic prediction using training set.
  # Census as (default) votes:
  pred <- predict(rb, iris[-5])
  yPred <- pred$yPred
  census <- pred$census

  # Using the \code{keyedFrame} option allows the columns of
  # \code{newdata} to appear in arbitrary order, so long as the
  # columns present during training appear as a subset:
  #
  pred <- predict(rb, iris[c(2, 4, 3, 1)], keyedFrame=TRUE)


  # As above, but validation census to report class probabilities:
  pred <- predict(rb, iris[-5], ctgCensus="prob")
  prob <- pred$prob


  # As above, but with training reponse as test vector:
  pred <- predict(rb, iris[-5], iris[5], ctgCensus = "prob")
  prob <- pred$prob
  conf <- pred$confusion
  misPred <- pred$misPred

  # As above, but predicts nonterminal when encountering categories
  # not observed during training.  That is, prediction returns a score
  # derived from all terminal nodes (leaves) reached from the
  # (nonterminal) testing node.
  #
  # In this case, "unobserved" refers to categories not present in
  # the subpartition over which a splitting is performed.  As training
  # partitions the data into smaller and smaller regions, a given
  # category becomes less likely to appear in a region.
  #
  # More generally, unobserved data can include missing predictors as
  # well as categories appearing in \code{newdata} which were not
  # present during training.
  #
  pred <- predict(rb, trapUnobserved=TRUE)

## End(Not run)

[Package Rborist version 0.3-7 Index]

predict method for arbTrain result

Description

Usage

Arguments

Value

Author(s)

See Also

Examples