mvar {mgm} | R Documentation |
Estimating mixed Vector Autoregressive Model (mVAR)
Description
Estimates mixed Vector Autoregressive Model (mVAR) via elastic-net regularized Generalized Linear Models
Usage
mvar(data, type, level, lambdaSeq, lambdaSel, lambdaFolds,
lambdaGam, alphaSeq, alphaSel, alphaFolds, alphaGam, lags,
consec, beepvar, dayvar, weights, threshold, method, binarySign,
scale, verbatim, pbar, warnings, saveModels, saveData,
overparameterize, thresholdCat, signInfo, ...)
Arguments
data |
n x p data matrix. |
type |
p vector indicating the type of variable for each column in |
level |
p vector indicating the number of categories of each variable. For continuous variables set to 1. |
lambdaSeq |
A sequence of lambdas that should be searched (see also |
lambdaSel |
Specifies the procedure for selecting the tuning parameter controlling the Lq-penalization. The two options are cross validation "CV" and the Extended Bayesian Information Criterion (EBIC) "EBIC". The EBIC performs well in selecting sparse graphs (see Barber and Drton, 2010 and Foygel and Drton, 2014). Note that when also searching the alpha parameter in the elastic net penalty, cross validation should be preferred, as the parameter vector will not necessarily be sparse anymore. The EBIC tends to be a bit more conservative than CV (see Haslbeck and Waldorp, 2016). CV can sometimes not be performed with categorical variables, because |
lambdaFolds |
Number of folds in cross validation if |
lambdaGam |
Hyperparameter gamma in the EBIC if |
alphaSeq |
A sequence of alpha parameters for the elastic net penality in [0,1] that should be searched (see also |
alphaSel |
Specifies the procedure for selecting the alpha parameter in the elastic net penalty. The two options are cross validation "CV" and the Extended Bayesian Information Criterion (EBIC) "EBIC". The EBIC performs well in selecting sparse graphs (see Barber and Drton, 2010 and Foygel and Drton, 2014). Note that when also searching the alpha parameter in the elastic net penalty, cross validation should be preferred, as the parameter vector will not necessarily be sparse anymore. The EBIC tends to be a bit more conservative than CV (see Haslbeck and Waldorp, 2016). CV can sometimes not be performed with categorical variables, because |
alphaFolds |
Number of folds in cross validation if |
alphaGam |
Hyperparameter gamma in the EBIC if |
lags |
Vector of positive integers indicating the lags included in the mVAR model (e.g. 1:3 or c(1,3,5)) |
consec |
An integer vector of length n, indicating the consecutiveness of measurement points of the rows in |
beepvar |
Together with the argument |
dayvar |
See |
weights |
A vector with n - max(lags) entries, indicating the weight for each observation. The mVAR design matrix has with n - max(lags) rows, because the first row must be predictable by the highest lag. The weights have to be on the scale [0, n - max(lags) ]. |
threshold |
A threshold below which edge-weights are put to zero. This is done in order to guarantee a lower bound on the false-positive rate. |
method |
Estimation method, currently only |
binarySign |
If |
scale |
If |
verbatim |
If |
pbar |
If |
warnings |
If |
saveModels |
If |
saveData |
If |
overparameterize |
If |
thresholdCat |
If |
signInfo |
If |
... |
Additional arguments. |
Details
See Haslbeck and Waldorp (2018) for details about how the mixed VAR model is estimated.
Value
The function returns a list with the following entries:
call |
Contains all provided input arguments. If |
wadj |
A p x p x n_lags array, in which rows are predicted by columns, i.e. entry |
signs |
A p x p x n_lags array, specifying the signs corresponding to the entries of |
edgecolor |
A p x p x n_lags array of colors indicating the sign of each parameter. This array contains the same information is |
rawlags |
List with entries equal to the number of specified lags in |
intercepts |
A list with p entries, which contain the intercept/thresholds for each node. In case a given node is categorical with m categories, there are m thresholds for this variable. |
nodemodels |
A list with p |
Author(s)
Jonas Haslbeck <jonashaslbeck@gmail.com>
References
Barber, R. F., & Drton, M. (2015). High-dimensional Ising model selection with Bayesian information criteria. Electronic Journal of Statistics, 9(1), 567-607.
Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models. In Advances in neural information processing systems (pp. 604-612).
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1), 1.
Haslbeck, J. M. B., & Waldorp, L. J. (2020). mgm: Estimating time-varying Mixed Graphical Models in high-dimensional Data. Journal of Statistical Software, 93(8), pp. 1-46. DOI: 10.18637/jss.v093.i08
Loh, P. L., & Wainwright, M. J. (2012, December). Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. In NIPS (pp. 2096-2104).
Yang, E., Baker, Y., Ravikumar, P., Allen, G. I., & Liu, Z. (2014, April). Mixed Graphical Models via Exponential Families. In AISTATS (Vol. 2012, pp. 1042-1050).
Examples
## Not run:
## We generate data from a mixed VAR model and then recover the model using mvar()
# 1) Define mVAR model
p <- 6 # Six variables
type <- c("c", "c", "c", "c", "g", "g") # 4 categorical, 2 gaussians
level <- c(2, 2, 4, 4, 1, 1) # 2 categoricals with m=2, 2 categoricals with m=4, two continuous
max_level <- max(level)
lags <- c(1, 3, 9) # include lagged effects of order 1, 3, 9
n_lags <- length(lags)
# Specify thresholds
thresholds <- list()
thresholds[[1]] <- rep(0, level[1])
thresholds[[2]] <- rep(0, level[2])
thresholds[[3]] <- rep(0, level[3])
thresholds[[4]] <- rep(0, level[4])
thresholds[[5]] <- rep(0, level[5])
thresholds[[6]] <- rep(0, level[6])
# Specify standard deviations for the Gaussians
sds <- rep(NULL, p)
sds[5:6] <- 1
# Create coefficient array
coefarray <- array(0, dim=c(p, p, max_level, max_level, n_lags))
# a.1) interaction between continuous 5<-6, lag=3
coefarray[5, 6, 1, 1, 2] <- .4
# a.2) interaction between 1<-3, lag=1
m1 <- matrix(0, nrow=level[2], ncol=level[4])
m1[1,1:2] <- 1
m1[2,3:4] <- 1
coefarray[1, 3, 1:level[2], 1:level[4], 1] <- m1
# a.3) interaction between 1<-5, lag=9
coefarray[1, 5, 1:level[1], 1:level[5], 3] <- c(0, 1)
# 2) Sample
set.seed(1)
dlist <- mvarsampler(coefarray = coefarray,
lags = lags,
thresholds = thresholds,
sds = sds,
type = type,
level = level,
N = 200,
pbar = TRUE)
# 3) Recover
set.seed(1)
mvar_obj <- mvar(data = dlist$data,
type = type,
level = level,
lambdaSel = "CV",
lags = c(1, 3, 9),
signInfo = FALSE,
overparameterize = F)
# Did we recover the true parameters?
mvar_obj$wadj[5, 6, 2] # cross-lagged effect of 6 on 2 over lag lags[2]
mvar_obj$wadj[1, 3, 1] # cross-lagged effect of 3 on 1 over lag lags[1]
mvar_obj$wadj[1, 5, 3] # cross-lagged effect of 1 on 5 over lag lags[3]
# How to get the exact parameter estimates?
# Example: the full parameters for the crossed-lagged interaction of 2 on 1 over lag lags[1]
mvar_obj$rawlags[[1]][[1]][[2]]
# 4) Predict / Compute nodewise Error
pred_mvar <- predict.mgm(mvar_obj, dlist$data)
head(pred_mvar$predicted) # first 6 rows of predicted values
pred_mvar$errors # Nodewise errors
# For more examples see https://github.com/jmbh/mgmDocumentation
## End(Not run)