R: Ordered Correlation Forest

ocf {ocf}

R Documentation

Ordered Correlation Forest

Description

Nonparametric estimator for ordered non-numeric outcomes. The estimator modifies a standard random forest splitting criterion to build a collection of forests, each estimating the conditional probability of a single class.

Usage

ocf(
  y = NULL,
  X = NULL,
  honesty = FALSE,
  honesty.fraction = 0.5,
  inference = FALSE,
  alpha = 0,
  n.trees = 2000,
  mtry = ceiling(sqrt(ncol(X))),
  min.node.size = 5,
  max.depth = 0,
  replace = FALSE,
  sample.fraction = ifelse(replace, 1, 0.5),
  n.threads = 1
)

Arguments

`y`	Outcome vector.
`X`	Covariate matrix (no intercept).
`honesty`	Whether to grow honest forests.
`honesty.fraction`	Fraction of honest sample. Ignored if `honesty = FALSE`.
`inference`	Whether to extract weights and compute standard errors. The weights extraction considerably slows down the routine. `honesty = TRUE` is required for valid inference.
`alpha`	Controls the balance of each split. Each split leaves at least a fraction `alpha` of observations in the parent node on each side of the split.
`n.trees`	Number of trees.
`mtry`	Number of covariates to possibly split at in each node. Default is the square root of the number of covariates.
`min.node.size`	Minimal node size.
`max.depth`	Maximal tree depth. A value of 0 corresponds to unlimited depth, 1 to "stumps" (one split per tree).
`replace`	If `TRUE`, grow trees on bootstrap subsamples. Otherwise, trees are grown on random subsamples drawn without replacement.
`sample.fraction`	Fraction of observations to sample.
`n.threads`	Number of threads. Zero corresponds to the number of CPUs available.

Value

Object of class ocf.

Author(s)

Riccardo Di Francesco

Examples

## Load data from orf package.
set.seed(1986)

library(orf)
data(odata)
odata <- odata[1:100, ] # Subset to reduce elapsed time.

y <- as.numeric(odata[, 1])
X <- as.matrix(odata[, -1])

## Training-test split.
train_idx <- sample(seq_len(length(y)), floor(length(y) * 0.5))

y_tr <- y[train_idx]
X_tr <- X[train_idx, ]

y_test <- y[-train_idx]
X_test <- X[-train_idx, ]

## Fit ocf on training sample.
forests <- ocf(y_tr, X_tr)

## We have compatibility with generic S3-methods.
print(forests)
summary(forests)
predictions <- predict(forests, X_test)
head(predictions$probabilities)
table(y_test, predictions$classification)

## Compute standard errors. This requires honest forests.
honest_forests <- ocf(y_tr, X_tr, honesty = TRUE, inference = TRUE)
head(honest_forests$predictions$standard.errors)

[Package ocf version 1.0.0 Index]