Fitting Functions {stabs}R Documentation

Fit Functions for Stability Selection

Description

Functions that fit a model until q variables are selected and that returns the indices (and names) of the selected variables.

Usage

## package lars:
lars.lasso(x, y, q, ...)
lars.stepwise(x, y, q, ...)

## package glmnet:
glmnet.lasso(x, y, q, type = c("conservative", "anticonservative"), ...)
glmnet.lasso_maxCoef(x, y, q, ...)

Arguments

x

a matrix containing the predictors or an object of class "mboost".

y

a vector or matrix containing the outcome.

q

number of (unique) selected variables (or groups of variables depending on the model) that are selected on each subsample.

type

a charachter vector specifying if the number of selected variables per subsample is \leq q (type = "conservative") or \geq q (type = "anticonservative"). The conservative version ensures that the PFER is controlled.

...

additional arguments passed to the underlying fitting function. See the example on glmnet.lasso_maxCoef in stabsel for the specification of additional arguments via stabsel.

Details

All fitting functions are named after the package and the type of model that is fitted: package_name.model, e.g., glmnet.lasso stands for a lasso model that is fitted using the package glmnet.

glmnet.lasso_maxCoef fits a lasso model with a given penalty parameter and returns the q largest coefficients. If one wants to use glmnet.lasso_maxCoef, one must specify the penalty parameter lambda (via the ... argument) or in stabsel via args.fitfun(lambda = ). Note that usually, the penalty parameter cannot be specified but is chosen such that q variables are selected. For an example on how to use glmnet.lasso_maxCoef see stabsel.

Value

A named list with elements

selected

logical. A vector that indicates which variable was selected.

path

logical. A matrix that indicates which variable was selected in which step. Each row represents one variable, the columns represent the steps.

See Also

stabsel for stability selection itself.

Examples

  if (require("TH.data")) {
      ## make data set available
      data("bodyfat", package = "TH.data")
  } else {
      ## simulate some data if TH.data not available. 
      ## Note that results are non-sense with this data.
      bodyfat <- matrix(rnorm(720), nrow = 72, ncol = 10)
  }
  
  if (require("lars")) {
      ## selected variables
      lars.lasso(bodyfat[, -2], bodyfat[,2], q = 3)$selected
      lars.stepwise(bodyfat[, -2], bodyfat[,2], q = 3)$selected
  }
  
  if (require("glmnet")) {
      glmnet.lasso(bodyfat[, -2], bodyfat[,2], q = 3)$selected
      ## selection path
      glmnet.lasso(bodyfat[, -2], bodyfat[,2], q = 3)$path
  
      ## Using the anticonservative glmnet.lasso (see args.fitfun):
      stab.glmnet <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                             fitfun = glmnet.lasso, 
                             args.fitfun = list(type = "anticonservative"), 
                             cutoff = 0.75, PFER = 1)
  }

[Package stabs version 0.6-4 Index]