R: Fit a ranked-sparsity model with regularized regression

sparseR {sparseR}

R Documentation

Fit a ranked-sparsity model with regularized regression

Description

Fit a ranked-sparsity model with regularized regression

Usage

sparseR(
  formula,
  data,
  family = c("gaussian", "binomial", "poisson", "coxph"),
  penalty = c("lasso", "MCP", "SCAD"),
  alpha = 1,
  ncvgamma = 3,
  lambda.min = 0.005,
  k = 1,
  poly = 2,
  gamma = 0.5,
  cumulative_k = FALSE,
  cumulative_poly = TRUE,
  pool = FALSE,
  ia_formula = NULL,
  pre_process = TRUE,
  model_matrix = NULL,
  y = NULL,
  poly_prefix = "_poly_",
  int_sep = "\\:",
  pre_proc_opts = c("knnImpute", "scale", "center", "otherbin", "none"),
  filter = c("nzv", "zv"),
  extra_opts = list(),
  ...
)

Arguments

`formula`	Names of the terms
`data`	Data
`family`	The family of the model
`penalty`	What penalty should be used (lasso, MCP, or SCAD)
`alpha`	The mix of L1 penalty (lower values introduce more L2 ridge penalty)
`ncvgamma`	The tuning parameter for ncvreg (for MCP or SCAD)
`lambda.min`	The minimum value to be used for lambda (as ratio of max, see ?ncvreg)
`k`	The maximum order of interactions to consider (default: 1; all pairwise)
`poly`	The maximum order of polynomials to consider (default: 2)
`gamma`	The degree of extremity of sparsity rankings (see details)
`cumulative_k`	Should penalties be increased cumulatively as order interaction increases?
`cumulative_poly`	Should penalties be increased cumulatively as order polynomial increases?
`pool`	Should interactions of order k and polynomials of order k+1 be pooled together for calculating the penalty?
`ia_formula`	formula to be passed to step_interact (for interactions, see details)
`pre_process`	Should the data be preprocessed (if FALSE, must provide model_matrix)
`model_matrix`	A data frame or matrix specifying the full model matrix (used if !pre_process)
`y`	A vector of responses (used if !pre_process)
`poly_prefix`	If model_matrix is specified, what is the prefix for polynomial terms?
`int_sep`	If model_matrix is specified, what is the separator for interaction terms?
`pre_proc_opts`	List of preprocessing steps (see details)
`filter`	The type of filter applied to main effects + interactions
`extra_opts`	A list of options for all preprocess steps (see details)
`...`	Additional arguments (passed to fitting function)

Details

Selecting gamma: higher values of gamma will penalize "group" size more. By default, this is set to 0.5, which yields equal contribution of prior information across orders of interactions/polynomials (this is a good default for most settings).

Additionally, setting cumulative_poly or cumulative_k to TRUE increases the penalty cumulatively based on the order of either polynomial or interaction.

The options that can be passed to pre_proc_opts are: - knnImpute (should missing data be imputed?) - scale (should data be standardized)? - center (should data be centered to the mean or another value?) - otherbin (should factors with low prevalence be combined?) - none (should no preprocessing be done? can also specify a null object)

The options that can be passed to extra_opts are:

centers (named numeric vector which denotes where each covariate should be centered)
center_fn (alternatively, a function can be specified to calculate center such as min or median)
freq_cut, unique_cut (see ?step_nzv; these get used by the filtering steps)
neighbors (the number of neighbors for knnImpute)
one_hot (see ?step_dummy), this defaults to cell-means coding which can be done in regularized regression (change at your own risk)
raw (should polynomials not be orthogonal? defaults to true because variables are centered and scaled already by this point by default)

ia_formula will by default interact all variables with each other up to order k. If specified, ia_formula will be passed as the terms argument to recipes::step_interact, so the help documentation for that function can be investigated for further assistance in specifying specific interactions.

Value

an object of class sparseR containing the following:

`fit`	the fit object returned by `ncvreg`
`srprep`	a `recipes` object used to prep the data
`pen_factors`	the factor multiple on penalties for ranked sparsity
`results`	all coefficients and penalty factors at minimum CV lambda
`results_summary`	a tibble of summary results at minimum CV lambda
`results1se`	all coefficients and penalty factors at lambda_1se
`results1se_summary`	a tibble of summary results at lambda_1se
`data`	the (unprocessed) data
`family`	the family argument (for non-normal, eg. poisson)
`info`	a list containing meta-info about the procedure

References

For fitting functionality, the ncvreg package is used; see Breheny, P. and Huang, J. (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist., 5: 232-253.

[Package sparseR version 0.3.1 Index]