ordinalNetCV {ordinalNet}R Documentation

Uses K-fold cross validation to obtain out-of-sample log-likelihood and misclassification rates. Lambda is tuned within each cross validation fold.

Description

The data is divided into K folds. ordinalNet is fit K times, each time leaving out one fold as a test set. For each of the K model fits, lambda can be tuned by AIC or BIC, or cross validation. If cross validation is used, the user can choose whether to user the best average out-of-sample log-likelihood, misclassification rate, Brier score, or percentage of deviance explained. The user can also choose the number of cross validation folds to use for tuning. Once the model is tuned, the out of sample log-likelihood, misclassification rate, Brier score, and percentage of deviance explained are calculated on the held out test set.

Usage

ordinalNetCV(
  x,
  y,
  lambdaVals = NULL,
  folds = NULL,
  nFolds = 5,
  nFoldsCV = 5,
  tuneMethod = c("cvLoglik", "cvMisclass", "cvBrier", "cvDevPct", "aic", "bic"),
  printProgress = TRUE,
  warn = TRUE,
  ...
)

Arguments

x

Covariate matrix.

y

Response variable. Can be a factor, ordered factor, or a matrix where each row is a multinomial vector of counts. A weighted fit can be obtained using the matrix option, since the row sums are essentially observation weights. Non-integer matrix entries are allowed.

lambdaVals

An optional user-specified lambda sequence (vector). If NULL, a sequence will be generated using the model fit to the full training data. This default sequence is based on nLambda and lambdaMinRatio, which can be passed as additional arguments (otherwise ordinalNet default values are used). The maximum lambda is the smallest value that sets all penalized coefficients to zero, and the minimum lambda is the maximum value multiplied by the factor lambdaMinRatio.

folds

An optional list, where each element is a vector of row indices corresponding to a different cross validation fold. Indices correspond to rows of the x matrix. Each index number should be used in exactly one fold. If NULL, the data will be randomly divided into equally-sized partitions. It is recommended to call set.seed before calling ordinalNetCV for reproducibility.

nFolds

Numer of cross validation folds. Only used if folds=NULL.

nFoldsCV

Number of cross validation folds used to tune lambda for each training set (i.e. within each training fold). Only used of tuneMethod is "cvLoglik", "cvMisclass", "cvBrier", or "cvDevPct.

tuneMethod

Method used to tune lambda for each training set (ie. within each training fold). The "cvLoglik", "cvMisclass", "cvBrier", and "cvDevPct" methods use cross validation with nFoldsCV folds and select the lambda value with the best average out-of-sample performance. The "aic" and "bic" methods are less computationally intensive because they do not require the model to be fit multiple times. Note that for the methods that require cross validation, the fold splits are determined randomly and cannot be specified by the user. The set.seed() function should be called prior to ordinalNetCV for reproducibility.

printProgress

Logical. If TRUE the fitting progress is printed to the terminal.

warn

Logical. If TRUE, the following warning message is displayed when fitting a cumulative probability model with nonparallelTerms=TRUE (i.e. nonparallel or semi-parallel model). "Warning message: For out-of-sample data, the cumulative probability model with nonparallelTerms=TRUE may predict cumulative probabilities that are not monotone increasing." The warning is displayed by default, but the user may wish to disable it.

...

Other arguments (besides x, y, lambdaVals, and warn) passed to ordinalNet.

Details

Value

An S3 object of class "ordinalNetCV", which contains the following:

loglik

Vector of out-of-sample log-likelihood values. Each value corresponds to a different fold.

misclass

Vector of out-of-sample misclassificaton rates. Each value corresponds to a different fold.

brier

Vector of out-of-sample Brier scores. Each value corresponds to a different fold.

devPct

Vector of out-of-sample percentages of deviance explained. Each value corresponds to a different fold.

bestLambdaIndex

The index of the value within the lambda sequence selected for each fold by the tuning method.

lambdaVals

The sequence of lambda values used for all cross validation folds.

folds

A list containing the index numbers of each fold.

fit

An object of class "ordinalNet", resulting from fitting ordinalNet to the entire dataset.

Examples

## Not run: 
# Simulate x as independent standard normal
# Simulate y|x from a parallel cumulative logit (proportional odds) model
set.seed(1)
n <- 50
intercepts <- c(-1, 1)
beta <- c(1, 1, 0, 0, 0)
ncat <- length(intercepts) + 1  # number of response categories
p <- length(beta)  # number of covariates
x <- matrix(rnorm(n*p), ncol=p)  # n x p covariate matrix
eta <- c(x %*% beta) + matrix(intercepts, nrow=n, ncol=ncat-1, byrow=TRUE)
invlogit <- function(x) 1 / (1+exp(-x))
cumprob <- t(apply(eta, 1, invlogit))
prob <- cbind(cumprob, 1) - cbind(0, cumprob)
yint <- apply(prob, 1, function(p) sample(1:ncat, size=1, prob=p))
y <- as.factor(yint)

# Evaluate out-of-sample performance of the  cumulative logit model
# when lambda is tuned by cross validation (best average out-of-sample log-likelihood)
cv <- ordinalNetCV(x, y, tuneMethod="cvLoglik")
summary(cv)

## End(Not run)


[Package ordinalNet version 2.12 Index]