R: Bayesian semiparametric modelling of covariance matrices for...

lmrm {BNSP}

R Documentation

Bayesian semiparametric modelling of covariance matrices for multivariate longitudinal data

Description

Implements an MCMC algorithm for posterior sampling based on a semiparametric model for continuous longitudinal multivariate responses. The overall model consists of 5 regression submodels and it utilizes spike-slab priors for variable selection and function regularization. See ‘Details’ section for a full description of the model.

Usage

lmrm(formula, data = list(), centre=TRUE, id, time,
sweeps, burn = 0, thin = 1, seed, StorageDir,
c.betaPrior = "IG(0.5,0.5*n*p)", pi.muPrior = "Beta(1,1)", 
c.alphaPrior = "IG(1.1,1.1)", pi.phiPrior = "Beta(1,1)", c.psiPrior = "HN(2)",
sigmaPrior = "HN(2)", pi.sigmaPrior = "Beta(1,1)", 
corr.Model = c("common", nClust = 1), DP.concPrior = "Gamma(5,2)",
c.etaPrior = "IG(0.5,0.5*samp)", pi.nuPrior = "Beta(1,1)", 
pi.fiPrior = "Beta(1,1)", c.omegaPrior = "IG(1.1,1.1)", sigmaCorPrior = "HN(2)",
tuneCalpha, tuneSigma2, tuneCbeta, tuneAlpha, tuneSigma2R, tuneR, tuneCpsi, 
tuneCbCor, tuneOmega, tuneComega, tau, FT = 1,...)

Arguments

`formula`	a formula defining the responses and the covariates in the 5 regression models e.g. `y1 \| y2 ~ x \| w \| z \| t \| t` or for smooth effects `y1 \| y2 ~ sm(x) \| sm(w) \| sm(z) \| sm(t) \| sm(t)`. The package uses the extended formula notation, where the responses are defined on the left of `~` and the models on the right.
`data`	a data frame.
`centre`	Binary indicator. If set equal to TRUE, the design matrices are centred, to have column mean equl to zero, otherwise, if set to FALSE, the columns are not centred.
`id`	identifiers of the individuals or other sampling units that are observed over time.
`time`	a vector input that specifies the time of observation
`sweeps`	total number of posterior samples, including those discarded in burn-in period (see argument `burn`) and those discarded by the thinning process (see argument `thin`).
`burn`	length of burn-in period.
`thin`	thinning parameter.
`seed`	optional seed for the random generator.
`StorageDir`	a required directory to store files with the posterior samples of models parameters.
`c.betaPrior`	The inverse Gamma prior of `c_{\beta}`. The default is "IG(0.5,0.5np)", that is, an inverse Gamma with parameters `1/2` and `np/2`, where `n` is the number of sampling units and `p` is the length of the response vector.
`pi.muPrior`	The Beta prior of `\pi_{\mu}`. The default is "Beta(1,1)". It can be of dimension `1`, of dimension `K` (the number of effects that enter the mean model), or of dimension `p K`
`c.alphaPrior`	The inverse Gamma prior of `c_{\alpha}^2`. The default is "IG(1.1,1.1)". Half-normal priors for `c_{\alpha}` are also available, declared using "HN(a)", where "a" is a positive number. It can be of dimension `1` or `p` (the length of the multivariate response).
`pi.phiPrior`	The Beta prior of `\pi_{\phi}`. The default is "Beta(1,1)". It can be of dimension `1`, of dimension `B` (the number of effects that enter the dependence model), or of dimension `p^2 B`
`c.psiPrior`	The prior of `c_{\psi}^2`. The default is "HN(2)", a half-normal prior for `c_{\psi}` with variance equal to two, `c_{\psi} \sim N(0,2) I[c_{\psi} > 0]`. Inverse Gamma priors for `c_{\psi}^2` are also available, declared using "IG(a,b)". It can be of dimension `1` or `p^2` (the number of dependence models).
`sigmaPrior`	The prior of `\sigma_k^2, k=1,\dots,p`. The default is "HN(2)", a half-normal prior for `\sigma_k` with variance equal to two, `\sigma_k \sim N(0,2) I[\sigma>0]`. Inverse Gamma priors for `\sigma_k^2` are also available, declared using "IG(a,b)". It can be of dimension `1` or `p` (the length of the multivariate response).
`pi.sigmaPrior`	The Beta prior of `\pi_{\sigma}`. The default is "Beta(1,1)". It can be of dimension `1`, of dimension `L` (the number of effects that enter the variance model), or of dimension `pL`
`corr.Model`	Specifies the model for the correlation matrices `R_t`. The three choices supported are "common", that specifies a common correlations model, "groupC", that specifies a grouped correlations model, and "groupV", that specifies a grouped variables model. When the model chosen is either "groupC" or "groupV", the upper limit on the number of clusters can also be specified, using corr.Model = c("groupC", nClust = d) or corr.Model = c("groupV", nClust = p). If the number of clusters is left unspecified, for the "groupV" model, it is taken to be `p`, the number of responses. For the "groupC" model, it is taken to be `d = p(p-1)/2`, the number of free elements in the correlation matrices.
`DP.concPrior`	The Gamma prior for the Dirichlet process concentration parameter.
`c.etaPrior`	The inverse Gamma prior of `c_{\eta}`. The default is "IG(0.5,0.5samp)", that is, an inverse Gamma with parameters `1/2` and `samp/2`, where `samp` is the number of correlations observed over time, that is $samp=Md$ where $M$ is the number of unique observation time points and $d$ is the number of non-redundant elements of $R$.
`pi.nuPrior`	The Beta prior of `\pi_{\nu}`. The default is "Beta(1,1)". It can be of dimension `1`.
`pi.fiPrior`	The Beta prior of `\pi_{\varphi}`. The default is "Beta(1,1)". It can be of dimension `1`.
`c.omegaPrior`	The prior of `c_{\omega}^2`. The default is "HN(2)", a half-normal prior for `c_{\omega}` with variance equal to two, `c_{\omega} \sim N(0,2) I[c_{\omega} > 0]`. Inverse Gamma priors for `c_{\omega}^2` are also available, declared using "IG(a,b)". It can be of dimension `1`.
`sigmaCorPrior`	The prior of `\sigma^2`. The default is "HN(2)", a half-normal prior for `\sigma^2` with variance equal to two, `\sigma \sim N(0,2) I[\sigma > 0]`. Inverse Gamma priors for `\sigma^2` are also available, declared using "IG(a,b)". It can be of dimension `1`.
`tuneCalpha`	Starting value of the tuning parameter for sampling `c_{\alpha k}, k=1,\dots,p`. Defaults at a vector of $p$ ones. It could be of dimension `p`.
`tuneSigma2`	Starting value of the tuning parameter for sampling `\sigma^2_k, k=1,\dots,p`. Defaults at a vector of $p$ ones. It could be of dimension `p`.
`tuneCbeta`	Starting value of the tuning parameter for sampling `c_{\beta}`. Defaults at 100.
`tuneAlpha`	Starting value of the tuning parameter for sampling regression coefficients of the variance models `\alpha_k, k=1,\dots,p`. Defaults at a vector of 5s. It could be of dimension `L p`
`tuneSigma2R`	Starting value of the tuning parameter for sampling `\sigma^2`. Defaults at 1.
`tuneR`	Starting value of the tuning parameter for sampling correlation matrices. Defaults at `40*(p+2)^3`. Can be of dimension `1` or `M` is the number of unique observation time points.
`tuneCpsi`	Starting value of the tuning parameter for sampling variances `c_{\psi}^2`. Defaults at 5. Can be of dimension `1` or `p^2`

`tuneCbCor`	Starting value of the tuning parameter for sampling `c_{\eta}^2`. Defaults at 10.
`tuneOmega`	Starting value of the tuning parameter for sampling regression coefficients of the variance models `\omega`. Defaults at 5.
`tuneComega`	Starting value of the tuning parameter for sampling `c_{\omega}`. Defaults at 1.
`tau`	The `tau` of the shadow prior. Defaults at 0.01.
`FT`	Binary indicator. If set equal to 1, the Fisher's z transform of the correlations is modelled, otherwise if set equal to 0, the untransformed correlations are modelled.
`...`	Other options that will be ignored.

Details

Function lmrm returns samples from the posterior distributions of the parameters of a regression model with normally distributed multivariate longitudinal responses. To describe the model, let Y_{ij} = (Y_{ij1},\dots,Y_{ijp})^{\top} denote the vector of p responses observed on individual i, i=1,\dots,n, at time point t_{ij}, j=1,\dots,n_i. The observational time points t_{ij} are allowed to be unequally spaced. Further, let u_{ij} denote the covariate vector that is observed along with Y_{ij} and that may include time, other time-dependent covariates and time-independent ones. In addition, let Y_{i}=(Y_{i1}^{\top},\dots,Y_{in_i}^{\top})^{\top} denote the ith response vector. With \mu_i=E(Y_{i}) and \Sigma_i = cov(Y_i), the model assumes multivariate normality, Y_{i} \sim N(\mu_i, \Sigma_i), i=1,2,\dots,n. The means \mu_i and covariance matrices \Sigma_i are modelled semiparametrically in terms of covariates. For the means one can specify semiparametric models,

\mu_{ijk} = \beta_{k0} + \sum_{l=1}^{K_1} u_{ijl} \beta_{kl} + \sum_{l=K_1+1}^{K} f_{\mu,k,l}(u_{ijl}).

This is the first of the 5 regression submodels.

To model the covariance matrix, first consider the modified Cholesky decomposition, L_i \Sigma_i L_i^{\top} = D_i, where L_i is a unit block lower triangular matrix and D_i is a block diagonal matrix,

\begin{array}{cc} L_i = \left[ \begin{array}{cccc} I & 0 & \dots & 0 \\ -\Phi_{i21} & I & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ -\Phi_{in_i1} & -\Phi_{in_i1} & \dots & I \\ \end{array} \right], & D_i = \left[ \begin{array}{cccc} D_{1} & 0 & \dots & 0 \\ 0 & D_{2} & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & D_{n_i} \\ \end{array} \right], \end{array}

For modelling D_{ij}, i=1,\dots,n, j=1,\dots,n_i in terms of covariates, first we separate the variances and the correlations D_{ij} = S_{ij}^{1/2} R_{ij} S_{ij}^{1/2}. It is easy to model matrix S_{ij} in terms of covariates as the only requirement on its diagonal elements is that they are nonnegative,

\log \sigma^2_{ijk} = \alpha_{k0} + \sum_{l=1}^{L_1} w_{ijl} \alpha_{kl} + \sum_{l=L_1+1}^{L} f_{\sigma,k,l}(w_{ijl})

This is the second of the 5 regression submodels.

For \phi_{ijklm}, the (l,m) element of \Phi_{ijk}, l,m=1,\dots,p, one can specify semiparametric models

\phi_{ijklm} = \psi_{lm0} + \sum_{b=1}^{B_1} v_{ijkb} \psi_{lmb} + \sum_{b=B_1+1}^{B} f_{\phi,l,m,b}(v_{ijkb})

This is the third of the 5 regression submodels.

The elements of the correlations matrices R_{ij} are modelled in terms of covariate time only, hence they are denoted by R_t. Subject to the positive definiteness constraint, the elements of R_t are modelled using a normal distribution with location and scale parameters, \mu_{ct} and \sigma^2_{ct}, modelled as

\mu_{ct} = \eta_0 + f_{\mu}(t),

\log \sigma^2_{ct} = \omega_0 + f_{\sigma}(t),

and these are the last 2 of the 5 submodels.

Value

Function lmrm returns samples from the posteriors of the model parameters.

Author(s)

Georgios Papageorgiou gpapageo@gmail.com

References

Papageorgiou, G. (2020). Bayesian semiparametric modelling of covariance matrices for multivariate longitudinal data. arXiv:2012.09833.

Examples

	# Fit a joint mean-covariance model on the simulated dataset simD2
	require(ggplot2)
	data(simD2)
	model <- Y1 | Y2 ~ time | sm(time) | sm(lag) | sm(time) | 1 
	# the above defines the responses and the regression models on the left and 
	# right of "~", respectively 
	# the first model, for the mean, is a linear function of time, this is sufficient as 
	# the 2 responses have constant mean.
	# the second model, for the variances, is a smooth function of time
	# the third model, for the dependence structure, is a smooth function of lag, 
	# that lmrm figures out and it does not need to be computed by the user
	# the fourth model, for location of the correlations, is a smooth function of time
	# the fifth model, for scale of the correlations, is just an intercept model
	## Not run: 
	m1 <- lmrm(formula = model, corr.Model = c("common", nClust = 1), data = simD2,
		       id = id, time = time, sweeps = 2500, burn = 500, thin = 2, 
		       StorageDir = getwd(), seed = 1)
	plot(m1)
	
## End(Not run)

[Package BNSP version 2.2.3 Index]