R: Uses K-fold cross validation to obtain out-of-sample...

ordinalNetTune {ordinalNet}

R Documentation

Uses K-fold cross validation to obtain out-of-sample log-likelihood and misclassification rates for a sequence of lambda values.

Description

The data is divided into K folds. ordinalNet is fit K times (K=nFolds), each time leaving out one fold as a test set. The same sequence of lambda values is used each time. The out-of-sample log-likelihood, misclassification rate, Brier score, and percentage of deviance explained are obtained for each lambda value from the held out test set. It is up to the user to determine how to tune the model using this information.

Usage

ordinalNetTune(
  x,
  y,
  lambdaVals = NULL,
  folds = NULL,
  nFolds = 5,
  printProgress = TRUE,
  warn = TRUE,
  ...
)

Arguments

`x`	Covariate matrix.
`y`	Response variable. Can be a factor, ordered factor, or a matrix where each row is a multinomial vector of counts. A weighted fit can be obtained using the matrix option, since the row sums are essentially observation weights. Non-integer matrix entries are allowed.
`lambdaVals`	An optional user-specified lambda sequence (vector). If `NULL`, a sequence will be generated using the model fit to the full training data. This default sequence is based on `nLambda` and `lambdaMinRatio`, which can be passed as additional arguments (otherwise `ordinalNet` default values are used). The maximum lambda is the smallest value that sets all penalized coefficients to zero, and the minimum lambda is the maximum value multiplied by the factor `lambdaMinRatio`.
`folds`	An optional list, where each element is a vector of row indices corresponding to a different cross validation fold. Indices correspond to rows of the `x` matrix. Each index number should be used in exactly one fold. If `NULL`, the data will be randomly divided into equal-sized partitions. It is recommended to use `set.seed` before calling this function to make results reproducible.
`nFolds`	Numer of cross validation folds. Only used if `folds=NULL`.
`printProgress`	Logical. If `TRUE` the fitting progress is printed to the terminal.
`warn`	Logical. If `TRUE`, the following warning message is displayed when fitting a cumulative probability model with `nonparallelTerms=TRUE` (i.e. nonparallel or semi-parallel model). "Warning message: For out-of-sample data, the cumulative probability model with nonparallelTerms=TRUE may predict cumulative probabilities that are not monotone increasing." The warning is displayed by default, but the user may wish to disable it.
`...`	Other arguments (besides `x`, `y`, `lambdaVals`, and `warn`) passed to `ordinalNet`.

Details

The fold partition splits can be passed by the user via the folds argument. By default, the data are randomly divided into equally-sized partitions. The set.seed function should be called prior to ordinalNetCV for reproducibility.
A sequence of lambda values can be passed by the user via the lambdaVals argument. By default, the sequence is generated by first fitting the model to the full data set (this sequence is determined by the nLambda and lambdaMinRatio arguments of ordinalNet).
The standardize argument of ordinalNet can be modified through the additional arguments (...). If standardize=TRUE, then the data are scaled within each cross validation fold. This is done because scaling is part of the statistical procedure and should be repeated each time the procedure is applied.

Value

An S3 object of class "ordinalNetTune", which contains the following:

loglik: Matrix of out-of-sample log-likelihood values. Each row corresponds to a lambda value, and each column corresponds to a fold.
misclass: Matrix of out-of-sample misclassificaton rates. Each row corresponds to a lambda value, and each column corresponds to a fold.
brier: Matrix of out-of-sample Brier scores. Each row corresponds to a lambda value, and each column corresponds to a fold.
devPct: Matrix of out-of-sample percentages of deviance explained. Each row corresponds to a lambda value, and each column corresponds to a fold.
lambdaVals: The sequence of lambda values used for all cross validation folds.
folds: A list containing the index numbers of each fold.
fit: An object of class "ordinalNet", resulting from fitting ordinalNet to the entire dataset.

Examples

## Not run: 
# Simulate x as independent standard normal
# Simulate y|x from a parallel cumulative logit (proportional odds) model
set.seed(1)
n <- 50
intercepts <- c(-1, 1)
beta <- c(1, 1, 0, 0, 0)
ncat <- length(intercepts) + 1  # number of response categories
p <- length(beta)  # number of covariates
x <- matrix(rnorm(n*p), ncol=p)  # n x p covariate matrix
eta <- c(x %*% beta) + matrix(intercepts, nrow=n, ncol=ncat-1, byrow=TRUE)
invlogit <- function(x) 1 / (1+exp(-x))
cumprob <- t(apply(eta, 1, invlogit))
prob <- cbind(cumprob, 1) - cbind(0, cumprob)
yint <- apply(prob, 1, function(p) sample(1:ncat, size=1, prob=p))
y <- as.factor(yint)

# Fit parallel cumulative logit model; select lambda by cross validation
tunefit <- ordinalNetTune(x, y)
summary(tunefit)
plot(tunefit)
bestLambdaIndex <- which.max(rowMeans(tunefit$loglik))
coef(tunefit$fit, whichLambda=bestLambdaIndex, matrix=TRUE)
predict(tunefit$fit, whichLambda=bestLambdaIndex)

## End(Not run)

[Package ordinalNet version 2.12 Index]