bwSelect {mgm} | R Documentation |
Select optimal bandwidth for time-varying MGMs and mVAR Models
Description
Selects the bandwidth parameter with lowest out of sample prediction error for MGMs and mVAR Models.
Usage
bwSelect(data, type, level, bwSeq, bwFolds,
bwFoldsize, modeltype, pbar, ...)
Arguments
data |
A n x p data matrix. |
type |
p vector indicating the type of variable for each column in |
level |
p vector indicating the number of categories of each variable. For continuous variables set to 1. |
bwSeq |
A sequence with candidate bandwidth values (0, s] with s < Inf. Note that the bandwidth is applied relative to the unit time interval [0,1] and hence a banwidth of > 2 corresponds roughly to equal weights for all time points and hence gives similar estimates as the stationary model estimated via |
bwFolds |
The number of folds (see details below). |
bwFoldsize |
The size of each fold (see details below). |
modeltype |
If |
pbar |
If TRUE a progress bar is shown. Defaults to |
... |
Arguments passed to |
Details
Performs a cross-validation scheme that is specified by bwFolds
and bwFoldsize
. In the first fold, the test set is defined by an equally spaced sequence between [1, n - bwFolds
] of length bwFoldsize
. In the second fold, the test set is defined by an equally spaced sequence between [2, n - bwFolds
+ 1] of length bwFoldsize
, etc. . Note that if bwFoldsize
= n / bwFolds
, this procedure is equal to bwFolds
-fold cross valildation. However, full cross validation is computationally very expensive and a single split in test/training set by setting bwFolds = 1
is sufficient in many situations. The procedure selects the bandwidth with the lowest prediction error, averaged over variables and time points in the test set.
bwSelect
computes the absolute error (continuous) or 0/1-loss (categorical) for each time point in the test set defined by bwFoldsize
as described in the previous paragraph for every fold specified in bwFolds
, separately for each variable. The computed errors are returned in different levels of aggregation in the output list (see below). Note that continuous variables are scaled (centered and divided by their standard deviation), hence the absolute error and 0/1-loss are roughly on the scale scale.
Note that selecting the bandwidth with the EBIC is no alternative. This is because the EBIC always selects the intercept model with the lowest bandwidth. The reason is that the unregularized intercept closely models the noise in the data and hence the penalty sets all other parameters to zero. This problem is solved by using out of sample prediction error in the cross validation scheme.
Value
The function returns a list with the following entries:
call |
Contains all provided input arguments. If |
bwModels |
Contains the models estimated at the time points in the tests set. For details see |
fullErrorFolds |
List with number of entries equal to the length of |
fullError |
The same as |
meanError |
List with number of entries equal to the length of |
testsets |
List with |
zeroweights |
List with |
Author(s)
Jonas Haslbeck <jonashaslbeck@gmail.com>
References
Barber, R. F., & Drton, M. (2015). High-dimensional Ising model selection with Bayesian information criteria. Electronic Journal of Statistics, 9(1), 567-607.
Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models. In Advances in neural information processing systems (pp. 604-612).
Haslbeck, J. M. B., & Waldorp, L. J. (2020). mgm: Estimating time-varying Mixed Graphical Models in high-dimensional Data. Journal of Statistical Software, 93(8), pp. 1-46. DOI: 10.18637/jss.v093.i08
Examples
## Not run:
## A) bwSelect for tvmgm()
# A.1) Generate noise data set
p <- 5
n <- 100
data_n <- matrix(rnorm(p*n), nrow=100)
head(data_n)
type <- c("c", "c", rep("g", 3))
level <- c(2, 2, 1, 1, 1)
x1 <- data_n[,1]
x2 <- data_n[,2]
data_n[x1>0,1] <- 1
data_n[x1<0,1] <- 0
data_n[x2>0,2] <- 1
data_n[x2<0,2] <- 0
head(data_n)
# A.2) Estimate optimal bandwidth parameter
bwobj_mgm <- bwSelect(data = data_n,
type = type,
level = level,
bwSeq = seq(0.05, 1, length=3),
bwFolds = 1,
bwFoldsize = 3,
modeltype = "mgm",
k = 3,
pbar = TRUE,
overparameterize = TRUE)
print.mgm(bwobj_mgm)
## B) bwSelect for tvmVar()
# B.1) Generate noise data set
p <- 5
n <- 100
data_n <- matrix(rnorm(p*n), nrow=100)
head(data_n)
type <- c("c", "c", rep("g", 3))
level <- c(2, 2, 1, 1, 1)
x1 <- data_n[,1]
x2 <- data_n[,2]
data_n[x1>0,1] <- 1
data_n[x1<0,1] <- 0
data_n[x2>0,2] <- 1
data_n[x2<0,2] <- 0
head(data_n)
# B.2) Estimate optimal bandwidth parameter
bwobj_mvar <- bwSelect(data = data_n,
type = type,
level = level,
bwSeq = seq(0.05, 1, length=3),
bwFolds = 1,
bwFoldsize = 3,
modeltype = "mvar",
lags = 1:3,
pbar = TRUE,
overparameterize = TRUE)
print.mgm(bwobj_mvar)
# For more examples see https://github.com/jmbh/mgmDocumentation
## End(Not run)