cv_risk_mod {riskscores}R Documentation

Run Cross-Validation to Tune Lambda0

Description

Runs k-fold cross-validation on a grid of \lambda_0 values. Records class accuracy and deviance for each \lambda_0. Returns an object of class "cv_risk_mod".

Usage

cv_risk_mod(
  X,
  y,
  weights = NULL,
  beta = NULL,
  a = -10,
  b = 10,
  max_iters = 100,
  tol = 1e-05,
  nlambda = 25,
  lambda_min_ratio = ifelse(nrow(X) < ncol(X), 0.01, 1e-04),
  lambda0 = NULL,
  nfolds = 10,
  foldids = NULL,
  parallel = FALSE,
  shuffle = TRUE,
  seed = NULL
)

Arguments

X

Input covariate matrix with dimension n \times p; every row is an observation.

y

Numeric vector for the (binomial) response variable.

weights

Numeric vector of length n with weights for each observation. Unless otherwise specified, default will give equal weight to each observation.

beta

Starting numeric vector with p coefficients. Default starting coefficients are rounded coefficients from a logistic regression model.

a

Integer lower bound for coefficients (default: -10).

b

Integer upper bound for coefficients (default: 10).

max_iters

Maximum number of iterations (default: 100).

tol

Tolerance for convergence (default: 1e-5).

nlambda

Number of lambda values to try (default: 25).

lambda_min_ratio

Smallest value for lambda, as a fraction of lambda_max (the smallest value for which all coefficients are zero). The default depends on the sample size (n) relative to the number of variables (p). If n > p, the default is 0.0001, close to zero. If n < p, the default is 0.01.

lambda0

Optional sequence of lambda values. By default, the function will derive the lambda0 sequence based on the data (see lambda_min_ratio).

nfolds

Number of folds, implied if foldids provided (default: 10).

foldids

Optional vector of values between 1 and nfolds.

parallel

If TRUE, parallel processing (using foreach) is implemented during cross-validation to increase efficiency (default: FALSE). User must first register parallel backend with a function such as doParallel::registerDoParallel.

shuffle

Whether order of coefficients is shuffled during coordinate descent (default: TRUE).

seed

An integer that is used as argument by set.seed() for offsetting the random number generator. Default is to not set a particular randomization seed.

Value

An object of class "cv_risk_mod" with the following attributes:

results

Dataframe containing a summary of deviance and accuracy for each value of lambda0 (mean and SD). Also includes the number of nonzero coefficients that are produced by each lambda0 when fit on the full data.

lambda_min

Numeric value indicating the lambda0 that resulted in the lowest mean deviance.

lambda_1se

Numeric value indicating the largest lamdba0 that had a mean deviance within one standard error of lambda_min.


[Package riskscores version 1.1.1 Index]