lmrm {BNSP} | R Documentation |
Implements an MCMC algorithm for posterior sampling based on a semiparametric model for continuous longitudinal multivariate responses. The overall model consists of 5 regression submodels and it utilizes spike-slab priors for variable selection and function regularization. See ‘Details’ section for a full description of the model.
lmrm(formula, data = list(), id, time, sweeps, burn = 0, thin = 1, seed, StorageDir, c.betaPrior = "IG(0.5,0.5*n*p)", pi.muPrior = "Beta(1,1)", c.alphaPrior = "IG(1.1,1.1)", pi.phiPrior = "Beta(1,1)", c.psiPrior = "HN(2)", sigmaPrior = "HN(2)", pi.sigmaPrior = "Beta(1,1)", corr.Model = c("common", nClust = 1), DP.concPrior = "Gamma(5,2)", c.etaPrior = "IG(0.5,0.5*samp)", pi.nuPrior = "Beta(1,1)", pi.fiPrior = "Beta(1,1)", c.omegaPrior = "IG(1.1,1.1)", sigmaCorPrior = "HN(2)", tuneCa, tuneSigma2, tuneCb, tuneAlpha, tuneSigma2R, tuneR, tuneCpsi, tuneCbCor, tuneOmega, tuneComega, tau, FT = 1,...)
formula |
a formula defining the responses and the covariates in the 5 regression models e.g. |
data |
a data frame. |
id |
identifiers of the individuals or other sampling units that are observed over time. |
time |
a vector input that specifies the time of observation |
sweeps |
total number of posterior samples, including those discarded in burn-in period
(see argument |
burn |
length of burn-in period. |
thin |
thinning parameter. |
seed |
optional seed for the random generator. |
StorageDir |
a required directory to store files with the posterior samples of models parameters. |
c.betaPrior |
The inverse Gamma prior of c_{β}. The default is "IG(0.5,0.5*n*p)", that is, an inverse Gamma with parameters 1/2 and np/2, where n is the number of sampling units and p is the length of the response vector. |
pi.muPrior |
The Beta prior of π_{μ}. The default is "Beta(1,1)". It can be of dimension 1, of dimension K (the number of effects that enter the mean model), or of dimension p K |
c.alphaPrior |
The inverse Gamma prior of c_{α}^2. The default is "IG(1.1,1.1)". Half-normal priors for c_{α} are also available, declared using "HN(a)", where "a" is a positive number. It can be of dimension 1 or p (the length of the multivariate response). |
pi.phiPrior |
The Beta prior of π_{φ}. The default is "Beta(1,1)". It can be of dimension 1, of dimension B (the number of effects that enter the dependence model), or of dimension p^2 B |
c.psiPrior |
The prior of c_{ψ}^2. The default is "HN(2)", a half-normal prior for c_{ψ} with variance equal to two, c_{ψ} \sim N(0,2) I[c_{ψ} > 0]. Inverse Gamma priors for c_{ψ}^2 are also available, declared using "IG(a,b)". It can be of dimension 1 or p^2 (the number of dependence models). |
sigmaPrior |
The prior of σ_k^2, k=1,…,p. The default is "HN(2)", a half-normal prior for σ_k with variance equal to two, σ_k \sim N(0,2) I[σ>0]. Inverse Gamma priors for σ_k^2 are also available, declared using "IG(a,b)". It can be of dimension 1 or p (the length of the multivariate response). |
pi.sigmaPrior |
The Beta prior of π_{σ}. The default is "Beta(1,1)". It can be of dimension 1, of dimension L (the number of effects that enter the variance model), or of dimension pL |
corr.Model |
Specifies the model for the correlation matrices R_t. The three choices supported are "common", that specifies a common correlations model, "groupC", that specifies a grouped correlations model, and "groupV", that specifies a grouped variables model. When the model chosen is either "groupC" or "groupV", the upper limit on the number of clusters can also be specified, using corr.Model = c("groupC", nClust = d) or corr.Model = c("groupV", nClust = p). If the number of clusters is left unspecified, for the "groupV" model, it is taken to be p, the number of responses. For the "groupC" model, it is taken to be d = p(p-1)/2, the number of free elements in the correlation matrices. |
DP.concPrior |
The Gamma prior for the Dirichlet process concentration parameter. |
c.etaPrior |
The inverse Gamma prior of c_{η}. The default is "IG(0.5,0.5*samp)", that is, an inverse Gamma with parameters 1/2 and samp/2, where samp is the number of correlations observed over time, that is $samp=M*d$ where $M$ is the number of unique observation time points and $d$ is the number of non-redundant elements of $R$. |
pi.nuPrior |
The Beta prior of π_{ν}. The default is "Beta(1,1)". It can be of dimension 1. |
pi.fiPrior |
The Beta prior of π_{\varphi}. The default is "Beta(1,1)". It can be of dimension 1. |
c.omegaPrior |
The prior of c_{ω}^2. The default is "HN(2)", a half-normal prior for c_{ω} with variance equal to two, c_{ω} \sim N(0,2) I[c_{ω} > 0]. Inverse Gamma priors for c_{ω}^2 are also available, declared using "IG(a,b)". It can be of dimension 1. |
sigmaCorPrior |
The prior of σ^2. The default is "HN(2)", a half-normal prior for σ^2 with variance equal to two, σ \sim N(0,2) I[σ > 0]. Inverse Gamma priors for σ^2 are also available, declared using "IG(a,b)". It can be of dimension 1. |
tuneCa |
Starting value of the tuning parameter for sampling c_{α k}, k=1,…,p. Defaults at a vector of $p$ ones. It could be of dimension p. |
tuneSigma2 |
Starting value of the tuning parameter for sampling σ^2_k, k=1,…,p. Defaults at a vector of $p$ ones. It could be of dimension p. |
tuneCb |
Starting value of the tuning parameter for sampling c_{β}. Defaults at 100. |
tuneAlpha |
Starting value of the tuning parameter for sampling regression coefficients of the variance models α_k, k=1,…,p. Defaults at a vector of 5s. It could be of dimension L p |
tuneSigma2R |
Starting value of the tuning parameter for sampling σ^2. Defaults at 1. |
tuneR |
Starting value of the tuning parameter for sampling correlation matrices. Defaults at 40*(p+2)^3. Can be of dimension 1 or M is the number of unique observation time points. |
tuneCpsi |
Starting value of the tuning parameter for sampling variances c_{ψ}^2. Defaults at 5. Can be of dimension 1 or p^2 |
.
tuneCbCor |
Starting value of the tuning parameter for sampling c_{η}^2. Defaults at 10. |
tuneOmega |
Starting value of the tuning parameter for sampling regression coefficients of the variance models ω. Defaults at 5. |
tuneComega |
Starting value of the tuning parameter for sampling c_{ω}. Defaults at 1. |
tau |
The tau of the shadow prior. Defaults at 0.01. |
FT |
Binary indicator. If set equal to 1, the Fisher's z transform of the correlations is modelled, otherwise if set equal to 0, the untransformed correlations are modelled. |
... |
Other options that will be ignored. |
Function lmrm
returns samples from the posterior distributions of the parameters of
a regression model with normally distributed multivariate longitudinal responses. To describe the model,
let Y_{ij} = (Y_{ij1},…,Y_{ijp})^{\top} denote the vector of p responses
observed on individual i, i=1,…,n, at time point t_{ij}, j=1,…,n_i.
The observational time points t_{ij} are allowed to be unequally spaced.
Further, let u_{ij} denote the covariate vector that is observed along with Y_{ij} and that may include time,
other time-dependent covariates and time-independent ones.
In addition, let Y_{i}=(Y_{i1}^{\top},…,Y_{in_i}^{\top})^{\top} denote the ith response vector.
With μ_i=E(Y_{i}) and Σ_i = cov(Y_i), the model assumes multivariate normality,
Y_{i} \sim N(μ_i, Σ_i), i=1,2,…,n.
The means μ_i and covariance matrices Σ_i are modelled semiparametrically in terms of covariates.
For the means one can specify semiparametric models,
μ_{ijk} = β_{k0} + ∑_{l=1}^{K_1} u_{ijl} β_{kl} + ∑_{l=K_1+1}^{K} f_{μ,k,l}(u_{ijl}).
This is the first of the 5 regression submodels.
To model the covariance matrix, first consider the modified Cholesky decomposition, L_i Σ_i L_i^{\top} = D_i, where L_i is a unit block lower triangular matrix and D_i is a block diagonal matrix,
\begin{array}{cc} L_i = ≤ft[ \begin{array}{cccc} I & 0 & … & 0 \\ -Φ_{i21} & I & … & 0 \\ \vdots & \vdots & \ddots & \vdots \\ -Φ_{in_i1} & -Φ_{in_i1} & … & I \\ \end{array} \right], & D_i = ≤ft[ \begin{array}{cccc} D_{1} & 0 & … & 0 \\ 0 & D_{2} & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & D_{n_i} \\ \end{array} \right], \end{array}
For modelling D_{ij}, i=1,…,n, j=1,…,n_i in terms of covariates, first we separate the variances and the correlations D_{ij} = S_{ij}^{1/2} R_{ij} S_{ij}^{1/2}. It is easy to model matrix S_{ij} in terms of covariates as the only requirement on its diagonal elements is that they are nonnegative,
\log σ^2_{ijk} = α_{k0} + ∑_{l=1}^{L_1} w_{ijl} α_{kl} + ∑_{l=L_1+1}^{L} f_{σ,k,l}(w_{ijl})
This is the second of the 5 regression submodels.
For φ_{ijklm}, the (l,m) element of Φ_{ijk}, l,m=1,…,p, one can specify semiparametric models
φ_{ijklm} = ψ_{lm0} + ∑_{b=1}^{B_1} v_{ijkb} ψ_{lmb} + ∑_{b=B_1+1}^{B} f_{φ,l,m,b}(v_{ijkb})
This is the third of the 5 regression submodels.
The elements of the correlations matrices R_{ij} are modelled in terms of covariate time only, hence they are denoted by R_t. Subject to the positive definiteness constraint, the elements of R_t are modelled using a normal distribution with location and scale parameters, μ_{ct} and σ^2_{ct}, modelled as
μ_{ct} = η_0 + f_{μ}(t),
\log σ^2_{ct} = ω_0 + f_{σ}(t),
and these are the last 2 of the 5 submodels.
Function lmrm
returns samples from the posteriors of the model parameters.
Georgios Papageorgiou gpapageo@gmail.com
Papageorgiou, G. (2020). Bayesian semiparametric modelling of covariance matrices for multivariate longitudinal data. arXiv:2012.09833.
# Fit a joint mean-covariance model on the simulated dataset simD2 require(ggplot2) data(simD2) model <- Y1 | Y2 ~ time | sm(time) | sm(lag) | sm(time) | 1 # the above defines the responses and the regression models on the left and # right of "~", respectively # the first model, for the mean, is a linear function of time, this is sufficient as # the 2 responses have constant mean. # the second model, for the variances, is a smooth function of time # the third model, for the dependence structure, is a smooth function of lag, # that lmrm figures out and it does not need to be computed by the user # the fourth model, for location of the correlations, is a smooth function of time # the fifth model, for scale of the correlations, is just an intercept model ## Not run: m1 <- lmrm(formula = model, corr.Model = c("common", nClust = 1), data = simD2, id = id, time = time, sweeps = 2500, burn = 500, thin = 2, StorageDir = getwd(), seed = 1) plot(m1) ## End(Not run)