R: Select optimal bandwidth for time-varying MGMs and mVAR...

bwSelect {mgm}

R Documentation

Select optimal bandwidth for time-varying MGMs and mVAR Models

Description

Selects the bandwidth parameter with lowest out of sample prediction error for MGMs and mVAR Models.

Usage

bwSelect(data, type, level, bwSeq, bwFolds,
         bwFoldsize, modeltype, pbar, ...)

Arguments

`data`	A n x p data matrix.
`type`	p vector indicating the type of variable for each column in `data`. "g" for Gaussian, "p" for Poisson, "c" for categorical.
`level`	p vector indicating the number of categories of each variable. For continuous variables set to 1.
`bwSeq`	A sequence with candidate bandwidth values (0, s] with s < Inf. Note that the bandwidth is applied relative to the unit time interval [0,1] and hence a banwidth of > 2 corresponds roughly to equal weights for all time points and hence gives similar estimates as the stationary model estimated via `mvar()`.
`bwFolds`	The number of folds (see details below).
`bwFoldsize`	The size of each fold (see details below).
`modeltype`	If `modeltype = "mvar"` model, the optimal bandwidth parameter for a `tvmvar()` model is selected. If `modeltype = "mgm"` model, the optimal bandwidth parameter for a `tvmgm()` model is selected. Additional arguments to `tvmvar()` or `tvmgm()` can be passed via the `...` argument.
`pbar`	If TRUE a progress bar is shown. Defaults to `pbar = "TRUE"`.
`...`	Arguments passed to `tvmgm` or `tvmvar`.

Details

Performs a cross-validation scheme that is specified by bwFolds and bwFoldsize. In the first fold, the test set is defined by an equally spaced sequence between [1, n - bwFolds] of length bwFoldsize. In the second fold, the test set is defined by an equally spaced sequence between [2, n - bwFolds + 1] of length bwFoldsize, etc. . Note that if bwFoldsize = n / bwFolds, this procedure is equal to bwFolds-fold cross valildation. However, full cross validation is computationally very expensive and a single split in test/training set by setting bwFolds = 1 is sufficient in many situations. The procedure selects the bandwidth with the lowest prediction error, averaged over variables and time points in the test set.

bwSelect computes the absolute error (continuous) or 0/1-loss (categorical) for each time point in the test set defined by bwFoldsize as described in the previous paragraph for every fold specified in bwFolds, separately for each variable. The computed errors are returned in different levels of aggregation in the output list (see below). Note that continuous variables are scaled (centered and divided by their standard deviation), hence the absolute error and 0/1-loss are roughly on the scale scale.

Note that selecting the bandwidth with the EBIC is no alternative. This is because the EBIC always selects the intercept model with the lowest bandwidth. The reason is that the unregularized intercept closely models the noise in the data and hence the penalty sets all other parameters to zero. This problem is solved by using out of sample prediction error in the cross validation scheme.

Value

The function returns a list with the following entries:

`call`	Contains all provided input arguments. If `saveData = TRUE`, it also contains the data.
`bwModels`	Contains the models estimated at the time points in the tests set. For details see `tvmvar` or `tvmgm`.
`fullErrorFolds`	List with number of entries equal to the length of `bwSeq` entries. Each entry contains a list with `bwFolds` entries. Each of those entries contains a contains a `bwFoldsize` times p matrix of out of sample prediction errors.
`fullError`	The same as `fullErrorFolds` but pooled over folds.
`meanError`	List with number of entries equal to the length of `bwSeq` entries. Each entry contains the average prediction error over variables and time points in the test set.
`testsets`	List with `bwFolds` entries, which contain the rows of the test sample for each fold.
`zeroweights`	List with `bwFolds` entries, which contains the observation weights used to fit the model at the `bwFoldsize` time points.

Author(s)

Jonas Haslbeck <jonashaslbeck@gmail.com>

References

Barber, R. F., & Drton, M. (2015). High-dimensional Ising model selection with Bayesian information criteria. Electronic Journal of Statistics, 9(1), 567-607.

Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models. In Advances in neural information processing systems (pp. 604-612).

Haslbeck, J. M. B., & Waldorp, L. J. (2020). mgm: Estimating time-varying Mixed Graphical Models in high-dimensional Data. Journal of Statistical Software, 93(8), pp. 1-46. DOI: 10.18637/jss.v093.i08

Examples


## Not run: 


## A) bwSelect for tvmgm() 

# A.1) Generate noise data set
p <- 5
n <- 100
data_n <- matrix(rnorm(p*n), nrow=100)
head(data_n)

type <- c("c", "c", rep("g", 3))
level <- c(2, 2, 1, 1, 1)
x1 <- data_n[,1]
x2 <- data_n[,2]
data_n[x1>0,1] <- 1
data_n[x1<0,1] <- 0
data_n[x2>0,2] <- 1
data_n[x2<0,2] <- 0

head(data_n)

# A.2) Estimate optimal bandwidth parameter

bwobj_mgm <- bwSelect(data = data_n,
                      type = type,
                      level = level,
                      bwSeq = seq(0.05, 1, length=3),
                      bwFolds = 1,
                      bwFoldsize = 3,
                      modeltype = "mgm",
                      k = 3,
                      pbar = TRUE,
                      overparameterize = TRUE)


print.mgm(bwobj_mgm)



## B) bwSelect for tvmVar() 

# B.1) Generate noise data set

p <- 5
n <- 100
data_n <- matrix(rnorm(p*n), nrow=100)
head(data_n)

type <- c("c", "c", rep("g", 3))
level <- c(2, 2, 1, 1, 1)
x1 <- data_n[,1]
x2 <- data_n[,2]
data_n[x1>0,1] <- 1
data_n[x1<0,1] <- 0
data_n[x2>0,2] <- 1
data_n[x2<0,2] <- 0

head(data_n)

# B.2) Estimate optimal bandwidth parameter

bwobj_mvar <- bwSelect(data = data_n,
                       type = type,
                       level = level,
                       bwSeq = seq(0.05, 1, length=3),
                       bwFolds = 1,
                       bwFoldsize = 3,
                       modeltype = "mvar",
                       lags = 1:3,
                       pbar = TRUE,
                       overparameterize = TRUE)


print.mgm(bwobj_mvar)

# For more examples see https://github.com/jmbh/mgmDocumentation



## End(Not run)

[Package mgm version 1.2-14 Index]