stan_gvar {tsnet} | R Documentation |
Fit Bayesian Graphical Vector Autoregressive (GVAR) Models with Stan
Description
This function fits a Bayesian GVAR model to the provided data using Stan. The estimation procedure is described further in Siepe, Kloft & Heck (2023) <doi:10.31234/osf.io/uwfjc>. The current implementation allows for a normal prior on the temporal effects and either an Inverse Wishart or an LKJ prior on the contemporaneous effects. 'rstan' is used as a backend for fitting the model in Stan. Data should be provided in long format, where the columns represent the variables and the rows represent the time points. Data are automatically z-scaled for estimation.
Usage
stan_gvar(
data,
beep = NULL,
priors = NULL,
method = "sampling",
cov_prior = "IW",
rmv_overnight = FALSE,
iter_sampling = 500,
iter_warmup = 500,
n_chains = 4,
n_cores = 1,
center_only = FALSE,
...
)
Arguments
data |
A data frame or matrix containing the time series data of a single subject. The data should be in long format, where the columns represent the variables and the rows represent the time points. See the example data [ts_data] for the correct format. |
beep |
A vector of beeps with length of 'nrow(data)'. The beep indicator can be used to remove overnight effects from the last beep of a day to the first beep of the next day. This should be a vector of positive integers. If left empty, the function will assume that there are no overnight effects to remove. |
priors |
A list of prior distributions for the model parameters. This should be a named list, with names corresponding to the parameter names and values corresponding to the prior distributions. The following priors can be specified:
|
method |
A string indicating the method to use for fitting the model. Options are "sampling" (for MCMC estimation) or "variational" (for variational inference). We currently recommend only using MCMC estimation. |
cov_prior |
A string indicating the prior distribution to use for the covariance matrix. Options are "LKJ" or "IW" (Inverse-Wishart). |
rmv_overnight |
A logical indicating whether to remove overnight effects. Default is 'FALSE'. If 'TRUE', the function will remove overnight effects from the last beep of a day to the first beep of the next day. This requires the 'beep' argument to be specified. |
iter_sampling |
An integer specifying the number of iterations for the sampling method. Default is 500. |
iter_warmup |
An integer specifying the number of warmup iterations for the sampling method. Default is 500. |
n_chains |
An integer specifying the number of chains for the sampling method. Default is 4. If variational inference is used, the number of iterations is calculated as 'iter_sampling'*'n_chains'. |
n_cores |
An integer specifying the number of cores to use for parallel computation. Default is 1. [rstan] is used for parallel computation. |
center_only |
A logical indicating whether to only center (and not scale) the data. Default is 'FALSE'. |
... |
Additional arguments passed to the 'rstan::sampling' or 'rstan::vb' function. |
Details
General Information
In a Graphical Vector Autoregressive (GVAR) model of lag 1, each variable is regressed on itself and all other variables at the previous timepoint to obtain estimates of the temporal association between variables (encapsulated in the beta matrix). This is the "Vector Autoregressive" part of the model. Additionally, the innovation structure at each time point (which resembles the residuals) is modeled to obtain estimates of the contemporaneous associations between all variables (controlling for the lagged effects). This is typically represented in the partial correlation (pcor) matrix. If the model is represented and interpreted as a network, variables are called *nodes*, *edges* represent the statistical association between the nodes, and *edge weights* quantify the strength of these associations.
Model
Let Y
be a matrix with n
rows and p
columns, where n_t
is the
number of time points and p
is the number of variables. The GVAR model is
given by the following equations:
Y_t = B* Y_{t-1} + zeta_t
zeta_t \sim N(0, \Sigma)
where B
is a 'p x p' matrix of VAR
coefficients between variables i and j (beta_ij), \zeta_t
contains the
innovations at time point t
, and \Sigma
is a 'p x p'covariance matrix.
The inverse of \Sigma
is the precision matrix, which is used to obtain the
partial correlations between variables (rho_ij). The model setup is
explained in more detail in Siepe, Kloft & Heck (2023)
<doi:10.31234/osf.io/uwfjc>.
Prior Setup
For the p x p temporal matrix B (containing the beta coefficients), we use a normal prior distribution on each individual parameter:
\beta_{ij}
\sim N(PriorBetaLoc_{ij}, PriorBetaScale_{ij})
where 'PriorBetaLoc' is the
mean of the prior distribution and 'PriorBetaScale' is the standard
deviation of the prior distribution. The default prior is a weakly
informative normal distribution with mean 0 and standard deviation 0.5. The
user can specify a different prior distribution by a matrix
'prior_Beta_loc' and a matrix 'prior_Beta_scale' with the same dimensions
as B
.
Both a Lewandowski-Kurowicka-Joe (LKJ) and an Inverse-Wishart (IW) distribution can be used as a prior for the contemporaneous network. However, the LKJ prior does not allow for direct specifications of priors on the partial correlations. We implemented a workaround to enable priors on specific partial correlations (described below). We consider this feature experimental would advise users wishing to implement edge-specific priors in the contemporaneous network to preferentially use IW priors.
The LKJ prior is a distribution on the correlation matrix, which is
parameterized by the shape parameter \eta
. To enable edge-specific priors
on the partial correlations, we use the workaround of a "joint" prior
that, in addition to the LKJ on the correlation matrix itself, allows for
an additional beta prior on each of the partial correlations. We first
assigned an uninformed LKJ prior to the Cholesky factor decomposition of
the correlation matrix of innovations:
\Omega_L \sim
LKJ-Cholesky(\eta)
. For \eta = 1
, this implies a symmetric marginal
scaled beta distribution on the zero-order correlations \omega_{ij}
.
(\omega_{ij}+1)/2 \sim Beta(p/2, p/2)
We can then obtain the covariance matrix and,
subsequently, the precision matrix (see Siepe, Kloft & Heck (2023))
for details.
The second part of the prior is a beta prior on each partial correlation
\rho_{ij}
(obtained from the off-diagonal elements of the precision matrix).
This prior was assigned by transforming the partial correlations to the
interval of 0,1 and then assigning a proportional (mean-variance
parameterized) beta prior:
(\rho_{ij}+1)/2 \sim Beta_{prop}(PriorRhoLoc, PriorRhoScale)
A beta location parameter of 0.5 translates to an expected correlation of 0.
The variance parameter of sqrt(0.5) implies a uniform distribution of
partial correlations.
The user can specify a different prior distribution by a matrix
'prior_Rho_loc' and a matrix 'prior_Rho_scale' with the same dimensions as
the partial correlation matrix. Additionally, the user can change eta
via the 'prior_Eta' parameter.
The Inverse-Wishart prior is a distribution on the innovation covariance matrix 'Sigma':
\Sigma \sim IW(\nu, S)
where \nu
is the degrees of freedom and S
is the scale matrix. We here
use the default prior of
nu = delta + p - 1
for the degrees of freedom,
where \delta
is defined as s_{\rho}^{-1}-1
and s_{\rho}
is the
standard deviation of the implied marginal beta distribution of the
partial correlations. For the scale matrix S
, we use the identity matrix
I_p
of order p.
The user can set a prior on the expected standard deviation of the partial
correlations by specifying a 'prior_Rho_marginal' parameter. The default
value is 0.25, which has worked well in a simulation study.
Additionally, the user can specify a 'prior_S' parameter to set a different
scale matrix.
Sampling The model can be fitted using either MCMC sampling or variational inference via [rstan]. Per default, the model is fitted using the Stan Hamiltonian Monte Carlo (HMC) No U-Turn (NUTS) sampler with 4 chains, 500 warmup iterations and 500 sampling iterations. We use a default target average acceptance probability 'adapt_delta' of 0.8. As the output is returned as a standard 'stanfit' object, the user can use the 'rstan' package to extract and analyze the results and obtain convergence diagnostics.
Value
A 'tsnet_fit' object in list format. The object contains the following elements:
fit |
A stanfit object containing the fitted model. |
arguments |
The number of variables "p", the number of time points "n_t", the column names "cnames", and the arguments used in the function call. |
Examples
# Load example data
data(ts_data)
example_data <- ts_data[1:100,1:3]
# Fit the model
fit <- stan_gvar(example_data,
method = "sampling",
cov_prior = "IW",
n_chains = 2)
print(fit)