resample {modnets} | R Documentation |
Bootstrapping or multi-sample splits for variable selection
Description
Multiple resampling procedures for selecting variables for a final network model. There are three resampling methods that can be parameterized in a variety of different ways. The ultimate goal is to fit models across iterated resamples with variable selection procedures built in so as to home in on the best predictors to include within a given model. The methods available include: bootstrapped resampling, multi-sample splitting, and stability selection.
Usage
resample(
data,
m = NULL,
niter = 10,
sampMethod = "bootstrap",
criterion = "AIC",
method = "glmnet",
rule = "OR",
gamma = 0.5,
nfolds = 10,
nlam = 50,
which.lam = "min",
threshold = FALSE,
bonf = FALSE,
alpha = 0.05,
exogenous = TRUE,
split = 0.5,
center = TRUE,
scale = FALSE,
varSeed = NULL,
seed = NULL,
verbose = TRUE,
lags = NULL,
binary = NULL,
type = "g",
saveMods = TRUE,
saveData = FALSE,
saveVars = FALSE,
fitit = TRUE,
nCores = 1,
cluster = "mclapply",
block = FALSE,
beepno = NULL,
dayno = NULL,
...
)
Arguments
data |
|
m |
Character vector or numeric vector indicating the moderator(s), if
any. Can also specify |
niter |
Number of iterations for the resampling procedure. |
sampMethod |
Character string indicating which type of procedure to use.
|
criterion |
The criterion for the variable selection procedure. Options
include: |
method |
Character string to indicate which method to use for variable
selection. Options include |
rule |
Only applies to GGMs (including between-subjects networks) when a
threshold is supplied. The |
gamma |
Numeric value of the hyperparameter for the |
nfolds |
Only relevant if |
nlam |
if |
which.lam |
Character string. Only applies if |
threshold |
Logical or numeric. If |
bonf |
Logical. Determines whether to apply a bonferroni adjustment on the distribution of p-values for each coefficient. |
alpha |
Type 1 error rate. Defaults to .05. |
exogenous |
Logical. Indicates whether moderator variables should be
treated as exogenous or not. If they are exogenous, they will not be
modeled as outcomes/nodes in the network. If the number of moderators
reaches |
split |
If |
center |
Logical. Determines whether to mean-center the variables. |
scale |
Logical. Determines whether to standardize the variables. |
varSeed |
Numeric value providing a seed to be set at the beginning of the selection procedure. Recommended for reproducible results. Importantly, this seed will be used for the variable selection models at each iteration of the resampler. Caution this means that while each model is run with a different sample, it will always have the same seed. |
seed |
Can be a single value, to set a seed before drawing random seeds
of length |
verbose |
Logical. Determines whether information about the modeling progress should be displayed in the console. |
lags |
Numeric or logical. Can only be 0, 1 or |
binary |
Numeric vector indicating which columns of the data contain binary variables. |
type |
Determines whether to use gaussian models |
saveMods |
Logical. Indicates whether to save the models fit to the samples at each iteration or not. |
saveData |
Logical. Determines whether to save the data from each subsample across iterations or not. |
saveVars |
Logical. Determines whether to save the variable selection models at each iteration. |
fitit |
Logical. Determines whether to fit the final selected model on
the original sample. If |
nCores |
Numeric value indicating the number of CPU cores to use for the
resampling. If |
cluster |
Character vector indicating which type of parallelization to
use, if |
block |
Logical or numeric. If specified, then this indicates that
|
beepno |
Character string or numeric value to indicate which variable
(if any) encodes the survey number within a single day. Must be used in
conjunction with |
dayno |
Character string or numeric value to indiciate which variable
(if any) encodes the survey number within a single day. Must be used in
conjunction with |
... |
Additional arguments. |
Details
Sampling methods can be specified via the sampMethod
argument.
- Bootstrapped resampling
Standard bootstrapped resampling, wherein a bootstrapped sample of size
n
is drawn with replacement at each iteration. Then, a variable selection procedure is applied to the sample, and the selected model is fit to obtain the parameter values. P-values and confidence intervals for the parameter distributions are then estimated.- Multi-sample splitting
Involves taking two disjoint samples from the original data – a training sample and a test sample. At each iteration the variable selection procedure is applied to the training sample, and then the resultant model is fit on the test sample. Parameters are then aggregated based on the coefficients in the models fit to the test samples.
- Stability selection
Stability selection begins the same as multi-sample splitting, in that two disjoint samples are drawn from the data at each iteration. However, the variable selection procedure is then applied to each of the two subsamples at each iteration. The objective is to compute the proportion of times that each predictor was selected in each subsample across iterations, as well as the proportion of times that it was simultaneously selected in both disjoint samples. At the end of the resampling, the final model is selected by setting a frequency threshold between 0 and 1, indicating the minimum proportion of samples that a variable would have to have been selected to be retained in the final model.
For the bootstrapping and multi-sample split methods, p-values are aggregated for each parameter using a method developed by Meinshausen, Meier, & Buhlmann (2009) that employs error control based on the false-discovery rate. The same procedure is employed for creating adjusted confidence intervals.
A key distinguishing feature of the bootstrapping procedure implemented in
this function versus the bootNet
function is that the latter is
designed to estimate the parameter distributions of a single model, whereas
the version here is aimed at using the bootstrapped resamples to select a
final model. In a practical sense, this boils down to using the bootstrapping
method in the resample
function to perform variable selection
at each iteration of the resampling, rather than taking a single constrained
model and applying it equally at all iterations.
Value
resample
output
References
Meinshausen, N., Meier, L., & Buhlmann, P. (2009). P-values for high-dimensional regression. Journal of the American Statistical Association. 104, 1671-1681.
Meinshausen, N., & Buhlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 72, 417-423
See Also
plot.resample, modSelect, fitNetwork,
bootNet, mlGVAR, plotNet, plotCoefs,
plotBoot, plotPvals, plotStability, net,
netInts, glinternet::glinternet,
glinternet::glinternet.cv,
glmnet::glmnet,
glmnet::cv.glmnet,
leaps::regsubsets
Examples
fit1 <- resample(ggmDat, m = 'M', niter = 10)
net(fit1)
netInts(fit1)
plot(fit1)
plot(fit1, what = 'coefs')
plot(fit1, what = 'bootstrap', multi = TRUE)
plot(fit1, what = 'pvals', outcome = 2, predictor = 4)
fit2 <- resample(gvarDat, m = 'M', niter = 10, lags = 1, sampMethod = 'stability')
plot(fit2, what = 'stability', outcome = 3)