stat.stability_selection {knockoff} | R Documentation |
Importance statistics based on stability selection
Description
Computes the difference statistic
W_j = |Z_j| - |\tilde{Z}_j|
where Z_j
and \tilde{Z}_j
are measure the importance
of the jth variable and its knockoff, respectively, based on the
stability of their selection upon subsampling of the data.
Usage
stat.stability_selection(X, X_k, y, fitfun = stabs::lars.lasso, ...)
Arguments
X |
n-by-p matrix of original variables. |
X_k |
n-by-p matrix of knockoff variables. |
y |
response vector (length n) |
fitfun |
fitfun a function that takes the arguments x, y as above, and additionally the number of variables to include in each model q. The function then needs to fit the model and to return a logical vector that indicates which variable was selected (among the q selected variables). The name of the function should be prefixed by 'stabs::'. |
... |
additional arguments specific to 'stabs' (see Details). |
Details
This function uses the stabs
package to compute
variable selection stability. The selection stability of the j-th
variable is defined as its probability of being selected upon random
subsampling of the data. The default method for selecting variables
in each subsampled dataset is lars.lasso
.
For a complete list of the available additional arguments, see stabsel
.
Value
A vector of statistics W
of length p.
See Also
Other statistics:
stat.forward_selection()
,
stat.glmnet_coefdiff()
,
stat.glmnet_lambdadiff()
,
stat.lasso_coefdiff_bin()
,
stat.lasso_coefdiff()
,
stat.lasso_lambdadiff_bin()
,
stat.lasso_lambdadiff()
,
stat.random_forest()
,
stat.sqrt_lasso()
Examples
set.seed(2022)
p=50; n=50; k=15
mu = rep(0,p); Sigma = diag(p)
X = matrix(rnorm(n*p),n)
nonzero = sample(p, k)
beta = 3.5 * (1:p %in% nonzero)
y = X %*% beta + rnorm(n)
knockoffs = function(X) create.gaussian(X, mu, Sigma)
# Basic usage with default arguments
result = knockoff.filter(X, y, knockoffs=knockoffs,
statistic=stat.stability_selection)
print(result$selected)