kko {kko} | R Documentation |
variable selection for additive model via KKO
Description
The function applys KKO to compute importance scores of components.
Usage
kko(
X,
y,
X_k,
rfn_range = c(2, 3, 4),
n_stb_tune = 50,
n_stb = 100,
cv_folds = 10,
frac_stb = 1/2,
nCores_para = 4,
rkernel = c("laplacian", "gaussian", "cauchy"),
rk_scale = 1
)
Arguments
X |
design matrix of additive model; rows are observations and columns are variables. |
y |
response of addtive model. |
X_k |
knockoffs matrix of design; the same size as X. |
rfn_range |
a vector of random feature expansion numbers to be tuned. |
n_stb_tune |
number of subsampling for tuning random feature numbers. |
n_stb |
number of subsampling for computing importance scores. |
cv_folds |
the folds of cross-validation for tuning group lasso penalty. |
frac_stb |
fraction of subsample size. |
nCores_para |
number of cores for parallelizing subsampling. |
rkernel |
kernel choices. Default is "laplacian". Other choices are "cauchy" and "gaussian". |
rk_scale |
scale parameter of sampling distribution for random feature expansion. For gaussian kernel, it is standard deviation of gaussian sampling distribution. |
Value
a list of selection results.
importance_score | importance scores of variables for knockoff filtering. |
selection_frequency | a 0/1 matrix of selection results on subsamples. Rows are subsamples, and columns are variables. The first half columns are variables of design X, and the latter are knockoffs X_k |
rfn_tune | tuned optimal random feature number. |
rfn_range | range of random feature numbers. |
tune_result | a list of tuning results. |
Author(s)
Xiaowu Dai, Xiang Lyu, Lexin Li
Examples
library(knockoff)
p=4 # number of predictors
sig_mag=100 # signal strength
n= 100 # sample size
rkernel="laplacian" # kernel choice
s=2 # sparsity, number of nonzero component functions
rk_scale=1 # scaling paramtere of kernel
rfn_range=c(2,3,4) # number of random features
cv_folds=15 # folds of cross-validation in group lasso
n_stb=10 # number of subsampling for importance scores
n_stb_tune=5 # number of subsampling for tuning random feature number
frac_stb=1/2 # fraction of subsample
nCores_para=2 # number of cores for parallelization
X=matrix(rnorm(n*p),n,p)%*%chol(toeplitz(0.3^(0:(p-1)))) # generate design
X_k = create.second_order(X) # generate knockoff
reg_coef=c(rep(1,s),rep(0,p-s)) # regression coefficient
reg_coef=reg_coef*(2*(rnorm(p)>0)-1)*sig_mag
y=X%*% reg_coef + rnorm(n) # response
kko(X,y,X_k,rfn_range,n_stb_tune,n_stb,cv_folds,frac_stb,nCores_para,rkernel,rk_scale)