lmrm {BNSP}R Documentation

Bayesian semiparametric modelling of covariance matrices for multivariate longitudinal data

Description

Implements an MCMC algorithm for posterior sampling based on a semiparametric model for continuous longitudinal multivariate responses. The overall model consists of 5 regression submodels and it utilizes spike-slab priors for variable selection and function regularization. See ‘Details’ section for a full description of the model.

Usage

lmrm(formula, data = list(), centre=TRUE, id, time,
sweeps, burn = 0, thin = 1, seed, StorageDir,
c.betaPrior = "IG(0.5,0.5*n*p)", pi.muPrior = "Beta(1,1)", 
c.alphaPrior = "IG(1.1,1.1)", pi.phiPrior = "Beta(1,1)", c.psiPrior = "HN(2)",
sigmaPrior = "HN(2)", pi.sigmaPrior = "Beta(1,1)", 
corr.Model = c("common", nClust = 1), DP.concPrior = "Gamma(5,2)",
c.etaPrior = "IG(0.5,0.5*samp)", pi.nuPrior = "Beta(1,1)", 
pi.fiPrior = "Beta(1,1)", c.omegaPrior = "IG(1.1,1.1)", sigmaCorPrior = "HN(2)",
tuneCalpha, tuneSigma2, tuneCbeta, tuneAlpha, tuneSigma2R, tuneR, tuneCpsi, 
tuneCbCor, tuneOmega, tuneComega, tau, FT = 1,...)

Arguments

formula

a formula defining the responses and the covariates in the 5 regression models e.g. y1 | y2 ~ x | w | z | t | t or for smooth effects y1 | y2 ~ sm(x) | sm(w) | sm(z) | sm(t) | sm(t). The package uses the extended formula notation, where the responses are defined on the left of ~ and the models on the right.

data

a data frame.

centre

Binary indicator. If set equal to TRUE, the design matrices are centred, to have column mean equl to zero, otherwise, if set to FALSE, the columns are not centred.

id

identifiers of the individuals or other sampling units that are observed over time.

time

a vector input that specifies the time of observation

sweeps

total number of posterior samples, including those discarded in burn-in period (see argument burn) and those discarded by the thinning process (see argument thin).

burn

length of burn-in period.

thin

thinning parameter.

seed

optional seed for the random generator.

StorageDir

a required directory to store files with the posterior samples of models parameters.

c.betaPrior

The inverse Gamma prior of cβc_{\beta}. The default is "IG(0.5,0.5*n*p)", that is, an inverse Gamma with parameters 1/21/2 and np/2np/2, where nn is the number of sampling units and pp is the length of the response vector.

pi.muPrior

The Beta prior of πμ\pi_{\mu}. The default is "Beta(1,1)". It can be of dimension 11, of dimension KK (the number of effects that enter the mean model), or of dimension pKp K

c.alphaPrior

The inverse Gamma prior of cα2c_{\alpha}^2. The default is "IG(1.1,1.1)". Half-normal priors for cαc_{\alpha} are also available, declared using "HN(a)", where "a" is a positive number. It can be of dimension 11 or pp (the length of the multivariate response).

pi.phiPrior

The Beta prior of πϕ\pi_{\phi}. The default is "Beta(1,1)". It can be of dimension 11, of dimension BB (the number of effects that enter the dependence model), or of dimension p2Bp^2 B

c.psiPrior

The prior of cψ2c_{\psi}^2. The default is "HN(2)", a half-normal prior for cψc_{\psi} with variance equal to two, cψN(0,2)I[cψ>0]c_{\psi} \sim N(0,2) I[c_{\psi} > 0]. Inverse Gamma priors for cψ2c_{\psi}^2 are also available, declared using "IG(a,b)". It can be of dimension 11 or p2p^2 (the number of dependence models).

sigmaPrior

The prior of σk2,k=1,,p\sigma_k^2, k=1,\dots,p. The default is "HN(2)", a half-normal prior for σk\sigma_k with variance equal to two, σkN(0,2)I[σ>0]\sigma_k \sim N(0,2) I[\sigma>0]. Inverse Gamma priors for σk2\sigma_k^2 are also available, declared using "IG(a,b)". It can be of dimension 11 or pp (the length of the multivariate response).

pi.sigmaPrior

The Beta prior of πσ\pi_{\sigma}. The default is "Beta(1,1)". It can be of dimension 11, of dimension LL (the number of effects that enter the variance model), or of dimension pLpL

corr.Model

Specifies the model for the correlation matrices RtR_t. The three choices supported are "common", that specifies a common correlations model, "groupC", that specifies a grouped correlations model, and "groupV", that specifies a grouped variables model. When the model chosen is either "groupC" or "groupV", the upper limit on the number of clusters can also be specified, using corr.Model = c("groupC", nClust = d) or corr.Model = c("groupV", nClust = p). If the number of clusters is left unspecified, for the "groupV" model, it is taken to be pp, the number of responses. For the "groupC" model, it is taken to be d=p(p1)/2d = p(p-1)/2, the number of free elements in the correlation matrices.

DP.concPrior

The Gamma prior for the Dirichlet process concentration parameter.

c.etaPrior

The inverse Gamma prior of cηc_{\eta}. The default is "IG(0.5,0.5*samp)", that is, an inverse Gamma with parameters 1/21/2 and samp/2samp/2, where sampsamp is the number of correlations observed over time, that is $samp=M*d$ where $M$ is the number of unique observation time points and $d$ is the number of non-redundant elements of $R$.

pi.nuPrior

The Beta prior of πν\pi_{\nu}. The default is "Beta(1,1)". It can be of dimension 11.

pi.fiPrior

The Beta prior of πφ\pi_{\varphi}. The default is "Beta(1,1)". It can be of dimension 11.

c.omegaPrior

The prior of cω2c_{\omega}^2. The default is "HN(2)", a half-normal prior for cωc_{\omega} with variance equal to two, cωN(0,2)I[cω>0]c_{\omega} \sim N(0,2) I[c_{\omega} > 0]. Inverse Gamma priors for cω2c_{\omega}^2 are also available, declared using "IG(a,b)". It can be of dimension 11.

sigmaCorPrior

The prior of σ2\sigma^2. The default is "HN(2)", a half-normal prior for σ2\sigma^2 with variance equal to two, σN(0,2)I[σ>0]\sigma \sim N(0,2) I[\sigma > 0]. Inverse Gamma priors for σ2\sigma^2 are also available, declared using "IG(a,b)". It can be of dimension 11.

tuneCalpha

Starting value of the tuning parameter for sampling cαk,k=1,,pc_{\alpha k}, k=1,\dots,p. Defaults at a vector of $p$ ones. It could be of dimension pp.

tuneSigma2

Starting value of the tuning parameter for sampling σk2,k=1,,p\sigma^2_k, k=1,\dots,p. Defaults at a vector of $p$ ones. It could be of dimension pp.

tuneCbeta

Starting value of the tuning parameter for sampling cβc_{\beta}. Defaults at 100.

tuneAlpha

Starting value of the tuning parameter for sampling regression coefficients of the variance models αk,k=1,,p\alpha_k, k=1,\dots,p. Defaults at a vector of 5s. It could be of dimension LpL p

tuneSigma2R

Starting value of the tuning parameter for sampling σ2\sigma^2. Defaults at 1.

tuneR

Starting value of the tuning parameter for sampling correlation matrices. Defaults at 40(p+2)340*(p+2)^3. Can be of dimension 11 or MM is the number of unique observation time points.

tuneCpsi

Starting value of the tuning parameter for sampling variances cψ2c_{\psi}^2. Defaults at 5. Can be of dimension 11 or p2p^2

.

tuneCbCor

Starting value of the tuning parameter for sampling cη2c_{\eta}^2. Defaults at 10.

tuneOmega

Starting value of the tuning parameter for sampling regression coefficients of the variance models ω\omega. Defaults at 5.

tuneComega

Starting value of the tuning parameter for sampling cωc_{\omega}. Defaults at 1.

tau

The tautau of the shadow prior. Defaults at 0.01.

FT

Binary indicator. If set equal to 1, the Fisher's z transform of the correlations is modelled, otherwise if set equal to 0, the untransformed correlations are modelled.

...

Other options that will be ignored.

Details

Function lmrm returns samples from the posterior distributions of the parameters of a regression model with normally distributed multivariate longitudinal responses. To describe the model, let Yij=(Yij1,,Yijp)Y_{ij} = (Y_{ij1},\dots,Y_{ijp})^{\top} denote the vector of pp responses observed on individual i,i=1,,n,i, i=1,\dots,n, at time point tij,j=1,,nit_{ij}, j=1,\dots,n_i. The observational time points tijt_{ij} are allowed to be unequally spaced. Further, let uiju_{ij} denote the covariate vector that is observed along with YijY_{ij} and that may include time, other time-dependent covariates and time-independent ones. In addition, let Yi=(Yi1,,Yini)Y_{i}=(Y_{i1}^{\top},\dots,Y_{in_i}^{\top})^{\top} denote the iith response vector. With μi=E(Yi)\mu_i=E(Y_{i}) and Σi=\Sigma_i = cov(Yi)(Y_i), the model assumes multivariate normality, YiN(μi,Σi),i=1,2,,nY_{i} \sim N(\mu_i, \Sigma_i), i=1,2,\dots,n. The means μi\mu_i and covariance matrices Σi\Sigma_i are modelled semiparametrically in terms of covariates. For the means one can specify semiparametric models,

μijk=βk0+l=1K1uijlβkl+l=K1+1Kfμ,k,l(uijl).\mu_{ijk} = \beta_{k0} + \sum_{l=1}^{K_1} u_{ijl} \beta_{kl} + \sum_{l=K_1+1}^{K} f_{\mu,k,l}(u_{ijl}).

This is the first of the 5 regression submodels.

To model the covariance matrix, first consider the modified Cholesky decomposition, LiΣiLi=Di,L_i \Sigma_i L_i^{\top} = D_i, where LiL_i is a unit block lower triangular matrix and DiD_i is a block diagonal matrix,

Li=[I00Φi21I0Φini1Φini1I],Di=[D1000D200000Dni], \begin{array}{cc} L_i = \left[ \begin{array}{cccc} I & 0 & \dots & 0 \\ -\Phi_{i21} & I & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ -\Phi_{in_i1} & -\Phi_{in_i1} & \dots & I \\ \end{array} \right], & D_i = \left[ \begin{array}{cccc} D_{1} & 0 & \dots & 0 \\ 0 & D_{2} & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & D_{n_i} \\ \end{array} \right], \end{array}

For modelling Dij,i=1,,n,j=1,,niD_{ij}, i=1,\dots,n, j=1,\dots,n_i in terms of covariates, first we separate the variances and the correlations Dij=Sij1/2RijSij1/2D_{ij} = S_{ij}^{1/2} R_{ij} S_{ij}^{1/2}. It is easy to model matrix SijS_{ij} in terms of covariates as the only requirement on its diagonal elements is that they are nonnegative,

logσijk2=αk0+l=1L1wijlαkl+l=L1+1Lfσ,k,l(wijl)\log \sigma^2_{ijk} = \alpha_{k0} + \sum_{l=1}^{L_1} w_{ijl} \alpha_{kl} + \sum_{l=L_1+1}^{L} f_{\sigma,k,l}(w_{ijl})

This is the second of the 5 regression submodels.

For ϕijklm\phi_{ijklm}, the (l,m)(l,m) element of Φijk,l,m=1,,p\Phi_{ijk}, l,m=1,\dots,p, one can specify semiparametric models

ϕijklm=ψlm0+b=1B1vijkbψlmb+b=B1+1Bfϕ,l,m,b(vijkb) \phi_{ijklm} = \psi_{lm0} + \sum_{b=1}^{B_1} v_{ijkb} \psi_{lmb} + \sum_{b=B_1+1}^{B} f_{\phi,l,m,b}(v_{ijkb})

This is the third of the 5 regression submodels.

The elements of the correlations matrices RijR_{ij} are modelled in terms of covariate time only, hence they are denoted by RtR_t. Subject to the positive definiteness constraint, the elements of RtR_t are modelled using a normal distribution with location and scale parameters, μct\mu_{ct} and σct2\sigma^2_{ct}, modelled as

μct=η0+fμ(t), \mu_{ct} = \eta_0 + f_{\mu}(t),

logσct2=ω0+fσ(t), \log \sigma^2_{ct} = \omega_0 + f_{\sigma}(t),

and these are the last 2 of the 5 submodels.

Value

Function lmrm returns samples from the posteriors of the model parameters.

Author(s)

Georgios Papageorgiou gpapageo@gmail.com

References

Papageorgiou, G. (2020). Bayesian semiparametric modelling of covariance matrices for multivariate longitudinal data. arXiv:2012.09833.

Examples

	# Fit a joint mean-covariance model on the simulated dataset simD2
	require(ggplot2)
	data(simD2)
	model <- Y1 | Y2 ~ time | sm(time) | sm(lag) | sm(time) | 1 
	# the above defines the responses and the regression models on the left and 
	# right of "~", respectively 
	# the first model, for the mean, is a linear function of time, this is sufficient as 
	# the 2 responses have constant mean.
	# the second model, for the variances, is a smooth function of time
	# the third model, for the dependence structure, is a smooth function of lag, 
	# that lmrm figures out and it does not need to be computed by the user
	# the fourth model, for location of the correlations, is a smooth function of time
	# the fifth model, for scale of the correlations, is just an intercept model
	## Not run: 
	m1 <- lmrm(formula = model, corr.Model = c("common", nClust = 1), data = simD2,
		       id = id, time = time, sweeps = 2500, burn = 500, thin = 2, 
		       StorageDir = getwd(), seed = 1)
	plot(m1)
	
## End(Not run)

[Package BNSP version 2.2.3 Index]