R: Run Cross-Validation to Tune Lambda0

cv_risk_mod {riskscores}

R Documentation

Run Cross-Validation to Tune Lambda0

Description

Runs k-fold cross-validation on a grid of \lambda_0 values. Records class accuracy and deviance for each \lambda_0. Returns an object of class "cv_risk_mod".

Usage

cv_risk_mod(
  X,
  y,
  weights = NULL,
  beta = NULL,
  a = -10,
  b = 10,
  max_iters = 100,
  tol = 1e-05,
  nlambda = 25,
  lambda_min_ratio = ifelse(nrow(X) < ncol(X), 0.01, 1e-04),
  lambda0 = NULL,
  nfolds = 10,
  foldids = NULL,
  parallel = FALSE,
  shuffle = TRUE,
  seed = NULL
)

Arguments

`X`	Input covariate matrix with dimension `n \times p`; every row is an observation.
`y`	Numeric vector for the (binomial) response variable.
`weights`	Numeric vector of length `n` with weights for each observation. Unless otherwise specified, default will give equal weight to each observation.
`beta`	Starting numeric vector with `p` coefficients. Default starting coefficients are rounded coefficients from a logistic regression model.
`a`	Integer lower bound for coefficients (default: -10).
`b`	Integer upper bound for coefficients (default: 10).
`max_iters`	Maximum number of iterations (default: 100).
`tol`	Tolerance for convergence (default: 1e-5).
`nlambda`	Number of lambda values to try (default: 25).
`lambda_min_ratio`	Smallest value for lambda, as a fraction of lambda_max (the smallest value for which all coefficients are zero). The default depends on the sample size (`n`) relative to the number of variables (`p`). If `n > p`, the default is 0.0001, close to zero. If `n < p`, the default is 0.01.
`lambda0`	Optional sequence of lambda values. By default, the function will derive the lambda0 sequence based on the data (see `lambda_min_ratio`).
`nfolds`	Number of folds, implied if `foldids` provided (default: 10).
`foldids`	Optional vector of values between 1 and `nfolds`.
`parallel`	If `TRUE`, parallel processing (using foreach) is implemented during cross-validation to increase efficiency (default: `FALSE`). User must first register parallel backend with a function such as doParallel::registerDoParallel.
`shuffle`	Whether order of coefficients is shuffled during coordinate descent (default: TRUE).
`seed`	An integer that is used as argument by `set.seed()` for offsetting the random number generator. Default is to not set a particular randomization seed.

Value

An object of class "cv_risk_mod" with the following attributes:

`results`	Dataframe containing a summary of deviance and accuracy for each value of `lambda0` (mean and SD). Also includes the number of nonzero coefficients that are produced by each `lambda0` when fit on the full data.
`lambda_min`	Numeric value indicating the `lambda0` that resulted in the lowest mean deviance.
`lambda_1se`	Numeric value indicating the largest `lamdba0` that had a mean deviance within one standard error of `lambda_min`.

[Package riskscores version 1.1.1 Index]