mvrm {BNSP} | R Documentation |
Bayesian semiparametric analysis of multivariate continuous responses, with variable selection
Description
Implements an MCMC algorithm for posterior sampling based on a semiparametric model for continuous multivariate responses and additive models for the mean and variance functions. The model utilizes spike-slab priors for variable selection and regularization. See ‘Details’ section for a full description of the model.
Usage
mvrm(formula, distribution = "normal", data = list(), centre = TRUE, sweeps,
burn = 0, thin = 1, seed, StorageDir, c.betaPrior = "IG(0.5, 0.5 * n * p)",
pi.muPrior = "Beta(1, 1)", c.alphaPrior = "IG(1.1, 1.1)", sigmaPrior = "HN(2)",
pi.sigmaPrior = "Beta(1, 1)", c.psiPrior = "HN(1)", phiPrior = "HN(2)",
pi.omegaPrior = "Beta(1, 1)", mu.RPrior = "N(0, 1)", sigma.RPrior = "HN(1)",
corr.Model = c("common", nClust = 1), DP.concPrior = "Gamma(5, 2)",
breaksPrior = "SBeta(1, 2)", tuneCbeta, tuneCalpha, tuneAlpha, tuneSigma2,
tuneCpsi, tunePsi, tunePhi, tuneR, tuneSigma2R, tuneHar, tuneBreaks, tunePeriod, tau,
FT = 1, compDeviance = FALSE, ...)
Arguments
formula |
a formula defining the responses and the covariates in the mean and variance models e.g. |
distribution |
The distribution for the response variables. Currently two options are supported: "normal" and "t". |
data |
a data frame. |
centre |
Binary indicator. If set equal to TRUE, the design matrices are centred, to have column mean equl to zero, otherwise, if set to FALSE, the columns are not centred. |
sweeps |
total number of posterior samples, including those discarded in burn-in period
(see argument |
burn |
length of burn-in period. |
thin |
thinning parameter. |
seed |
optional seed for the random generator. |
StorageDir |
a required directory to store files with the posterior samples of models parameters. |
c.betaPrior |
The inverse Gamma prior of |
pi.muPrior |
The Beta prior of |
c.alphaPrior |
The prior of |
sigmaPrior |
The prior of |
pi.sigmaPrior |
The Beta prior of |
c.psiPrior |
The prior of |
phiPrior |
The prior of |
pi.omegaPrior |
The Beta prior of |
mu.RPrior |
The normal prior for |
sigma.RPrior |
The half normal prior for |
corr.Model |
Specifies the model for the correlation matrix |
DP.concPrior |
The Gamma prior for the Dirichlet process concentration parameter. |
breaksPrior |
The prior for the shifts associated with the growth break points. The shift is taken to have a scaled Beta prior with support the (0, p) interval, where p is the period of the sin curve. The default SBeta(1, 2) is a scaled Beta(1, 2) distribution supported in the (0, p) interval. The shifts are in increasing order. |
tuneCbeta |
Starting value of the tuning parameter for sampling |
tuneCalpha |
Starting value of the tuning parameter for sampling |
tuneAlpha |
Starting value of the tuning parameter for sampling regression coefficients of the variance model
|
tuneSigma2 |
Starting value of the tuning parameter for sampling variances |
tuneCpsi |
Starting value of the tuning parameter for sampling |
tunePsi |
Starting value of the tuning parameter for sampling |
tunePhi |
Starting value of the tuning parameter for sampling |
tuneR |
Starting value of the tuning parameter for sampling correlation matrices.
Defaults at |
tuneSigma2R |
Starting value of the tuning parameter for sampling |
tuneHar |
Starting value of the tuning parameter for sampling the regression coefficients of the harmonics. Defaults at 100. |
tuneBreaks |
Starting value of the tuning parameter for sampling the shift parameters associated with growth breaks. Defaults at 0.01 times the period of the sin wave. |
tunePeriod |
Starting value of the tuning parameter for sampling the period parameter of the sin curve. Defaults to 0.01. |
tau |
The |
FT |
Binary indicator. If set equal to 1, the Fisher's z transform of the correlations is modelled, otherwise if set equal to 0, the untransformed correlations are modelled. |
compDeviance |
Binary indicator. If set equal to 1, the deviance is computed. |
... |
Other options that will be ignored. |
Details
Function mvrm
returns samples from the posterior distributions of the parameters of
a regression model with normally distributed multivariate responses and mean and variance functions modeled
in terms of covariates. For instance, in the presence of two responses (y_1, y_2
) and two covariates
in the mean model (u_1, u_2
) and two in the variance model (w_1, w_2
), we may choose to fit
\mu_u = \beta_0 + \beta_1 u_1 + f_{\mu}(u_2),
\log(\sigma^2_W) = \alpha_0 + \alpha_1 w_1 + f_{\sigma}(w_2),
parametrically modelling the effects of u_1
and w_1
and non-parametrically modelling
the effects of u_2
and w_2
. Smooth functions, such as f_{\mu}
and f_{\sigma}
, are represented by basis function expansion,
f_{\mu}(u_2) = \sum_{j} \beta_{j} \phi_{j}(u_2),
f_{\sigma}(w_2) = \sum_{j} \alpha_{j} \phi_{j}(w_2),
where \phi
are the basis functions and \beta
and \alpha
are regression coefficients.
The variance model can equivalently be expressed as
\sigma^2_W = \exp(\alpha_0) \exp(\alpha_1 w_1 + f_{\sigma}(w_2)) = \sigma^2 \exp(\alpha_1 w_1 + f_{\sigma}(w_2)),
where \sigma^2 = \exp(\alpha_0)
. This is the parameterization that we adopt in this implementation.
Positive prior probability that the regression coefficients in the mean model are exactly
zero is achieved by defining binary variables \gamma
that take value \gamma=1
if the associated coefficient \beta \neq 0
and \gamma = 0
if \beta = 0
.
Indicators \delta
that take value \delta=1
if the associated coefficient
\alpha \neq 0
and \delta = 0
if \alpha = 0
for the variance function
are defined analogously. We note that all coefficients in the mean and variance functions are
subject to selection except the intercepts, \beta_0
and \alpha_0
.
Prior specification:
For the vector of non-zero regression coefficients \beta_{\gamma}
we specify a g-prior
\beta_{\gamma} | c_{\beta}, \sigma^2, \gamma, \alpha, \delta \sim N(0,c_{\beta} \sigma^2
(\tilde{X}_{\gamma}^{\top} \tilde{X}_{\gamma} )^{-1}).
where \tilde{X}
is a scaled version of design matrix X
of the mean model.
For the vector of non-zero regression coefficients \alpha_{\delta}
we specify a normal prior
\alpha_{\delta} | c_{\alpha}, \delta \sim N(0,c_{\alpha} I).
Independent priors are specified for the indicators variables \gamma
and \delta
as
P(\gamma = 1 | \pi_{\mu}) = \pi_{\mu}
and
P(\delta = 1 | \pi_{\sigma}) = \pi_{\sigma}
.
Further, Beta priors are specified for \pi_{\mu}
and \pi_{\sigma}
\pi_{\mu} \sim Beta(c_{\mu},d_{\mu}), \pi_{\sigma} \sim Beta(c_{\sigma},d_{\sigma}).
We note that blocks of regression coefficients associated with distinct covariate effects
have their own probability of selection (\pi_{\mu}
or \pi_{\sigma}
)
and this probability has its own prior distribution.
Further, we specify inverse Gamma priors for c_{\beta}
and c_{\alpha}
c_{\beta} \sim IG(a_{\beta},b_{\beta}),
c_{\alpha} \sim IG(a_{\alpha},b_{\alpha})
For \sigma^2
we consider inverse Gamma and half-normal priors
\sigma^2 \sim IG(a_{\sigma},b_{\sigma}), |\sigma| \sim N(0,\phi^2_{\sigma}).
Lastly, for the elements of the correlation matrix, we specify normal distributions with mean \mu_R
and variance \sigma^2_R
,
with the priors on these two parameters being normal and half-normal, respectively. This is the common correlations model. Further,
the grouped correlations model can be specified. It considers a mixture of normal distributions for the means \mu_R
. The grouped correlations model can also be specified. It clusters the variables instead of the correlations.
Value
Function mvrm
returns the following:
call |
the matched call. |
formula |
model formula. |
seed |
the seed that was used (in case replication of the results is needed). |
data |
the dataset |
X |
the mean model design matrix. |
Z |
the variance model design matrix. |
LG |
the length of the vector of indicators |
LD |
the length of the vector of indicators |
mcpar |
the MCMC parameters: length of burn in period, total number of samples, thinning period. |
nSamples |
total number of posterior samples |
DIR |
the storage directory |
Further, function mvrm
creates files where the posterior samples are written.
These files are (with all file names preceded by ‘BNSP.’):
alpha.txt |
contains samples from the posterior of vector |
beta.txt |
contains samples from the posterior of vector |
gamma.txt |
contains samples from the posterior of the vector of the indicators
|
delta.txt |
contains samples from the posterior of the vector of the indicators
|
sigma2.txt |
contains samples from the posterior of the error variance |
cbeta.txt |
contains samples from the posterior of |
calpha.txt |
contains samples from the posterior of |
R.txt |
contains samples from the posterior of the correlation matrix |
theta.txt |
contains samples from the posterior of |
muR.txt |
contains samples from the posterior of |
sigma2R.txt |
contains samples from the posterior of |
deviance.txt |
contains the deviance, minus twice the log likelihood evaluated at the sampled values of the parameters. |
In addition to the above, for models that cluster the correlations we have
compAlloc.txt |
The cluster at which the correlations were allocated, |
nmembers.txt |
The numbers of correlations assigned to each cluster. |
DPconc.txt |
Contains samples from the posterior of the Dirichlet process concentration parameter. |
In addition to the above, for models that cluster the variables we have
compAllocV.txt |
The cluster at which the variables were allocated, |
nmembersV.txt |
The numbers of variables assigned to each cluster. |
Author(s)
Georgios Papageorgiou gpapageo@gmail.com
References
Papageorgiou, G. and Marshall, B.C. (2019). Bayesian semiparametric analysis of multivariate continuous responses, with variable selection. arXiv.
Papageorgiou, G. (2018). BNSP: an R Package for fitting Bayesian semiparametric regression models and variable selection. The R Journal, 10(2):526-548.
Chan, D., Kohn, R., Nott, D., & Kirby, C. (2006). Locally adaptive semiparametric estimation of the mean and variance functions in regression models. Journal of Computational and Graphical Statistics, 15(4), 915-936.
Examples
# Fit a mean/variance regression model on the cps71 dataset from package np.
#This is a univariate response model
require(np)
require(ggplot2)
data(cps71)
model <- logwage ~ sm(age, k = 30, bs = "rd") | sm(age, k = 30, bs = "rd")
DIR <- getwd()
## Not run: m1 <- mvrm(formula = model, data = cps71, sweeps = 10000, burn = 5000,
thin = 2, seed = 1, StorageDir = DIR)
#Print information and summarize the model
print(m1)
summary(m1)
#Summarize and plot one parameter of interest
alpha <- mvrm2mcmc(m1, "alpha")
summary(alpha)
plot(alpha)
#Obtain a plot of a term in the mean model
wagePlotOptions <- list(geom_point(data = cps71, aes(x = age, y = logwage)))
plot(x = m1, model = "mean", term = "sm(age)", plotOptions = wagePlotOptions)
plot(m1)
#Obtain predictions for new values of the predictor "age"
predict(m1, data.frame(age = c(21, 65)), interval = "credible")
# Fit a bivariate mean/variance model on the marks dataset from package ggm
# two responses: marks mechanics and vectors, and one covariate: marks on algebra
model2 <- mechanics | vectors ~ sm(algebra, k = 5) | sm(algebra, k = 3)
m2 <- mvrm(formula = model2, data = marks, sweeps = 100000, burn = 50000,
thin = 2, seed = 1, StorageDir = DIR)
plot(m2)
## End(Not run)