R: Stability selection for a quadrupen fit.

stability {quadrupen}

R Documentation

Stability selection for a quadrupen fit.

Description

Compute the stability path of a (possibly randomized) fitting procedure as introduced by Meinshausen and Buhlmann (2010).

Usage

stability(
  x,
  y,
  penalty = c("elastic.net", "bounded.reg"),
  subsamples = 100,
  sample.size = floor(n/2),
  randomize = TRUE,
  weakness = 0.5,
  verbose = TRUE,
  folds = replicate(subsamples, sample(1:nrow(x), sample.size), simplify = FALSE),
  mc.cores = 2,
  ...
)

Arguments

`x`	matrix of features, possibly sparsely encoded (experimental). Do NOT include intercept.
`y`	response vector.
`penalty`	a string for the fitting procedure used for cross-validation. Either `elastic.net` or `"bounded.reg"`.
`subsamples`	integer indicating the number of subsamplings used to estimate the selection probabilities. Default is 100.
`sample.size`	integer indicating the size of each subsamples. Default is `floor(n/2)`.
`randomize`	Should a randomized version of the fitting procedure by used? Default is `TRUE`. See details below.
`weakness`	Coefficient used for randomizing. Default is `0.5`. Ignored when `randomized` is `FALSE`. See details below.
`verbose`	logical; indicates if the progression should be displayed. Default is `TRUE`.
`folds`	list with `subsamples` entries with vectors describing the folds to use for the stability procedure. By default, the folds are randomly sampled with the specified `subsamples` argument.
`mc.cores`	the number of cores to use. The default uses 2 cores.
`...`	additional parameters to overwrite the defaults of the fitting procedure. See the corresponding documentation (`elastic.net` or `bounded.reg`)

Value

An object of class stability.path.

Note

When randomized = TRUE, the penscale argument that weights the penalty tuned by \lambda_1 is perturbed (divided) for each subsample by a random variable uniformly distributed on [α,1], where α is the weakness parameter.

If the user runs the fitting method with option 'bulletproof' set to FALSE, the algorithm may stop at an early stage of the path. Early stops of the underlying fitting function are handled internally, in the following way: we chose to simply skip the results associated with such runs, in order not to bias the stability selection procedure. If it occurs too often, a warning is sent to the user, in which case you should reconsider the grid of lambda1 for stability selection. If bulletproof is TRUE (the default), there is nothing to worry about, except a possible slow down when any switching to the proximal algorithm is required.

References

N. Meinshausen and P. Buhlmann (2010). Stability Selection, JRSS(B).

Examples

## Simulating multivariate Gaussian with blockwise correlation
## and piecewise constant vector of parameters
beta <- rep(c(0,1,0,-1,0), c(25,10,25,10,25))
Soo  <- matrix(0.75,25,25) ## bloc correlation between zero variables
Sww  <- matrix(0.75,10,10) ## bloc correlation between active variables
Sigma <- bdiag(Soo,Sww,Soo,Sww,Soo) + 0.2
diag(Sigma) <- 1
n <- 100
x <- as.matrix(matrix(rnorm(95*n),n,95) %*% chol(Sigma))
y <- 10 + x %*% beta + rnorm(n,0,10)

## Build a vector of label for true nonzeros
labels <- rep("irrelevant", length(beta))
labels[beta != 0] <- c("relevant")
labels <- factor(labels, ordered=TRUE, levels=c("relevant","irrelevant"))

## Call to stability selection function, 200 subsampling
stab <- stability(x,y, subsamples=200, lambda2=1, min.ratio=1e-2)
## Recover the selected variables for a given cutoff
## and per-family error rate, without producing any plot
stabpath <- plot(stab, cutoff=0.75, PFER=1, plot=FALSE)

cat("\nFalse positives for the randomized Elastic-net with stability selection: ",
     sum(labels[stabpath$selected] != "relevant"))
cat("\nDONE.\n")

[Package quadrupen version 0.2-12 Index]