stat.stability_selection {knockoff} | R Documentation |
Importance statistics based on stability selection
Description
Computes the difference statistic
where and
are measure the importance
of the jth variable and its knockoff, respectively, based on the
stability of their selection upon subsampling of the data.
Usage
stat.stability_selection(X, X_k, y, fitfun = stabs::lars.lasso, ...)
Arguments
X |
n-by-p matrix of original variables. |
X_k |
n-by-p matrix of knockoff variables. |
y |
response vector (length n) |
fitfun |
fitfun a function that takes the arguments x, y as above, and additionally the number of variables to include in each model q. The function then needs to fit the model and to return a logical vector that indicates which variable was selected (among the q selected variables). The name of the function should be prefixed by 'stabs::'. |
... |
additional arguments specific to 'stabs' (see Details). |
Details
This function uses the stabs
package to compute
variable selection stability. The selection stability of the j-th
variable is defined as its probability of being selected upon random
subsampling of the data. The default method for selecting variables
in each subsampled dataset is lars.lasso
.
For a complete list of the available additional arguments, see stabsel
.
Value
A vector of statistics of length p.
See Also
Other statistics:
stat.forward_selection()
,
stat.glmnet_coefdiff()
,
stat.glmnet_lambdadiff()
,
stat.lasso_coefdiff_bin()
,
stat.lasso_coefdiff()
,
stat.lasso_lambdadiff_bin()
,
stat.lasso_lambdadiff()
,
stat.random_forest()
,
stat.sqrt_lasso()
Examples
set.seed(2022)
p=50; n=50; k=15
mu = rep(0,p); Sigma = diag(p)
X = matrix(rnorm(n*p),n)
nonzero = sample(p, k)
beta = 3.5 * (1:p %in% nonzero)
y = X %*% beta + rnorm(n)
knockoffs = function(X) create.gaussian(X, mu, Sigma)
# Basic usage with default arguments
result = knockoff.filter(X, y, knockoffs=knockoffs,
statistic=stat.stability_selection)
print(result$selected)