varSelect {modnets}R Documentation

Variable selection for moderated networks

Description

Perform variable selection via the LASSO, best subsets selection, forward selection, backward selection, or sequential replacement on unmoderated networks. Or, perform variable selection via the hierarchical LASSO for moderated networks. Can be used for both GGMs and SUR networks.

Usage

varSelect(
  data,
  m = NULL,
  criterion = "AIC",
  method = "glmnet",
  lags = NULL,
  exogenous = TRUE,
  type = "g",
  center = TRUE,
  scale = FALSE,
  gamma = 0.5,
  nfolds = 10,
  varSeed = NULL,
  useSE = TRUE,
  nlam = NULL,
  covs = NULL,
  verbose = TRUE,
  beepno = NULL,
  dayno = NULL
)

Arguments

data

n x k dataframe or matrix.

m

Character vector or numeric vector indicating the moderator(s), if any. Can also specify "all" to make every variable serve as a moderator, or 0 to indicate that there are no moderators. If the length of m is k - 1 or longer, then it will not be possible to have the moderators as exogenous variables. Thus, exogenous will automatically become FALSE.

criterion

The criterion for the variable selection procedure. Options include: "cv", "aic", "bic", "ebic", "cp", "rss", "adjr2", "rsq", "r2". "CV" refers to cross-validation, the information criteria are "AIC", "BIC", "EBIC", and "Cp", which refers to Mallow's Cp. "RSS" is the residual sum of squares, "adjR2" is adjusted R-squared, and "Rsq" or "R2" is R-squared. Capitalization is ignored. For methods based on the LASSO, only "CV", "AIC", "BIC", "EBIC" are available. For methods based on subset selection, only "Cp", "BIC", "RSS", "adjR2", "R2" are available.

method

Character string to indicate which method to use for variable selection. Options include "lasso" and "glmnet", both of which use the LASSO via the glmnet package (either with glmnet::glmnet or glmnet::cv.glmnet, depending upon the criterion). "subset", "backward", "forward", "seqrep", all call different types of subset selection using the leaps::regsubsets function. Finally "glinternet" is used for applying the hierarchical lasso, and is the only method available for moderated network estimation (either with glinternet::glinternet or glinternet::glinternet.cv, depending upon the criterion). If one or more moderators are specified, then method will automatically default to "glinternet".

lags

Numeric or logical. Can only be 0, 1 or TRUE or FALSE. NULL is interpreted as FALSE. Indicates whether to fit a time-lagged network or a GGM.

exogenous

Logical. Indicates whether moderator variables should be treated as exogenous or not. If they are exogenous, they will not be modeled as outcomes/nodes in the network. If the number of moderators reaches k - 1 or k, then exogenous will automatically be FALSE.

type

Determines whether to use gaussian models "g" or binomial models "c". Can also just use "gaussian" or "binomial". Moreover, a vector of length k can be provided such that a value is given to every variable. Ultimately this is not necessary, though, as such values are automatically detected.

center

Logical. Determines whether to mean-center the variables.

scale

Logical. Determines whether to standardize the variables.

gamma

Numeric value of the hyperparameter for the "EBIC" criterion. Only relevant if criterion = "EBIC". Recommended to use a value between 0 and .5, where larger values impose a larger penalty on the criterion.

nfolds

Only relevant if criterion = "CV". Determines the number of folds to use in cross-validation.

varSeed

Numeric value providing a seed to be set at the beginning of the selection procedure. Recommended for reproducible results.

useSE

Logical. Only relevant if method = "glinternet" and criterion = "CV". Indicates whether to use the standard error of the estimates across folds, if TRUE, or to use the standard deviation, if FALSE.

nlam

if method = "glinternet", determines the number of lambda values to evaluate in the selection path.

covs

Numeric or character string indicating a variable to be used as a covariate. Currently not working properly.

verbose

Logical. Determines whether to provide output to the console about the status of the procedure.

beepno

Character string or numeric value to indicate which variable (if any) encodes the survey number within a single day. Must be used in conjunction with dayno argument.

dayno

Character string or numeric value to indicate which variable (if any) encodes the survey number within a single day. Must be used in conjunction with beepno argument.

Details

The primary value of the output is to be used as input when fitting the selected model with the fitNetwork function. Specifically, the output of varSelect can be assigned to the type argument of fitNetwork in order to fit the constrained models that were selected across nodes.

For moderated networks, the only variable selection approach available is through the glinternet package, which implements the hierarchical LASSO. The criterion for model selection dictates which function from the package is used, where information criteria use the glinternet::glinternet function to compute models, and cross-validation calls the glinternet::glinternet.cv function.

Value

List of all models, with the selected variables for each along with model coefficients and the variable selection models themselves. Primarily for use as input to the type argument of the fitNetwork function.

See Also

resample, fitNetwork, bootNet, mlGVAR, glinternet::glinternet, glinternet::glinternet.cv, glmnet::glmnet, glmnet::cv.glmnet, leaps::regsubsets

Examples


vars1 <- varSelect(ggmDat, criterion = 'BIC', method = 'subset')
fit1 <- fitNetwork(ggmDat, type = vars1)

vars2 <- varSelect(ggmDat, criterion = 'CV', method = 'glmnet')
fit2 <- fitNetwork(ggmDat, type = vars2, which.lam = 'min')

# Add a moderator
vars3 <- varSelect(ggmDat, m = 'M', criterion = 'EBIC', gamma = .5)
fit3 <- fitNetwork(ggmDat, moderators = 'M', type = vars3)


[Package modnets version 0.9.0 Index]