ocf {ocf}R Documentation

Ordered Correlation Forest

Description

Nonparametric estimator for ordered non-numeric outcomes. The estimator modifies a standard random forest splitting criterion to build a collection of forests, each estimating the conditional probability of a single class.

Usage

ocf(
  y = NULL,
  X = NULL,
  honesty = FALSE,
  honesty.fraction = 0.5,
  inference = FALSE,
  alpha = 0,
  n.trees = 2000,
  mtry = ceiling(sqrt(ncol(X))),
  min.node.size = 5,
  max.depth = 0,
  replace = FALSE,
  sample.fraction = ifelse(replace, 1, 0.5),
  n.threads = 1
)

Arguments

y

Outcome vector.

X

Covariate matrix (no intercept).

honesty

Whether to grow honest forests.

honesty.fraction

Fraction of honest sample. Ignored if honesty = FALSE.

inference

Whether to extract weights and compute standard errors. The weights extraction considerably slows down the routine. honesty = TRUE is required for valid inference.

alpha

Controls the balance of each split. Each split leaves at least a fraction alpha of observations in the parent node on each side of the split.

n.trees

Number of trees.

mtry

Number of covariates to possibly split at in each node. Default is the square root of the number of covariates.

min.node.size

Minimal node size.

max.depth

Maximal tree depth. A value of 0 corresponds to unlimited depth, 1 to "stumps" (one split per tree).

replace

If TRUE, grow trees on bootstrap subsamples. Otherwise, trees are grown on random subsamples drawn without replacement.

sample.fraction

Fraction of observations to sample.

n.threads

Number of threads. Zero corresponds to the number of CPUs available.

Value

Object of class ocf.

Author(s)

Riccardo Di Francesco

See Also

marginal_effects

Examples

## Load data from orf package.
set.seed(1986)

library(orf)
data(odata)
odata <- odata[1:100, ] # Subset to reduce elapsed time.

y <- as.numeric(odata[, 1])
X <- as.matrix(odata[, -1])

## Training-test split.
train_idx <- sample(seq_len(length(y)), floor(length(y) * 0.5))

y_tr <- y[train_idx]
X_tr <- X[train_idx, ]

y_test <- y[-train_idx]
X_test <- X[-train_idx, ]

## Fit ocf on training sample.
forests <- ocf(y_tr, X_tr)

## We have compatibility with generic S3-methods.
print(forests)
summary(forests)
predictions <- predict(forests, X_test)
head(predictions$probabilities)
table(y_test, predictions$classification)

## Compute standard errors. This requires honest forests.
honest_forests <- ocf(y_tr, X_tr, honesty = TRUE, inference = TRUE)
head(honest_forests$predictions$standard.errors)


[Package ocf version 1.0.0 Index]