R: Bayesian instrinsic Gaussian Markov Random Field model for...

gmrfdpgrow {growfunctions}

R Documentation

Bayesian instrinsic Gaussian Markov Random Field model for dependent time-indexed functions

Description

Estimates a collection of time-indexed functions under intrinsic Gaussian Markov random field prior formulations where a Dirichlet process mixture allows sub-groupings of the functions to share the same iGMRF precision parameter. The iGMRF formulation supports any number of additive precision terms, expressing either or both of multiple trend and seasonality.

Usage

gmrfdpgrow(
  y,
  ksi,
  E,
  ipr,
  q_order,
  q_type,
  q_shape,
  q_rate,
  tau_shape,
  tau_rate,
  dp_shape,
  dp_rate,
  M_init,
  w_star,
  n.iter,
  n.burn,
  n.thin,
  nu,
  Rep,
  progress,
  jitter,
  kappa_fast,
  stable_launch
)

Arguments

`y`	A multivariate continuous response, specified as an N x T matrix, where `N` denotes the number of functions and `T`, the number of time points per function. Intermittent missing-at-random values are allowed and will be estimated from the posterior predictive distribution. Missing cells should be denoted with `NA`.
`ksi`	An N x P matrix of N observations of P predictors to be used in prior probability of co-clustering of set of N, T x 1 observations. Defaults to `ksi = NULL` such that predictors are not used to a priori determine co-clustering probabilities.
`E`	A multivariate offset variable, specified as an N x T matrix, in the case that `y` is of type count data. The offset will be used to model the `y` as under a poisson lognormal where `y ~ Pois(E*exp(Psi))`. Defaults to `NULL`, in which case the response type is assumed continuous.
`ipr`	An optional input vector of inclusion probabilities for each observation unit in the case the observed data were acquired through an informative sampling design, so that unbiased inference about the population requires adjustments to the observed sample Defaults to `ipr = rep(1,nrow(y))` indicating an iid sample.
`q_order`	An integer vector of length `K` to select the order for each iGMRF precision term. e.g. If the first term is a RW2 and there is a second is a 3-month seasonality term, where the time points are indexed by month, then `q_order = c(2,3)`. Defaults to `q_order = 2`
`q_type`	A vector of length `K`, the number of iGMRF precision terms, with each entry indicating whether the associated term is a trend (`"tr"`) or seasonality (`"sn"`) term. So all entries must be one of `c("tr","sn")`. Defaults to `q_type = "tr"`.
`q_shape`	The value (in (0,infty)) for the shape hyperparameter for the Gamma base distribution for the iGMRF scale parameters, `kappa_star(k,m)`, where `k` denotes the term and `m`, the cluster. Defaults to `q_shape = 0.3`.
`q_rate`	The rate parameter of the Gamma base distribution on `kappa_star`. Defaults to `q_rate = 0.0005`.
`tau_shape`	The value (in (0,infty)) for the shape hyperparameter for the Gamma prior on the error precision parameter. Defaults to `tau_shape = 1.0`.
`tau_rate`	The rate parameter of the Gamma prior distribution on `tau_e`. Defaults to `tau_rate = 1`.
`dp_shape`	The shape parameter for the Gamma prior on the DP concentration parameter, `conc`. Defaults to `dp_shape = 1`.
`dp_rate`	The rate parameter for the Gamma prior on the DP concentration parameter, `conc`. Defaults to `dp_rate = 1`.
`M_init`	Starting number of clusters of `nrow(y)` units to initialize sampler. Defaults to `M_init = nrow(y)`.
`w_star`	Integer value denoting the number of cluster locations to sample ahead of observations in the auxiliary Gibbs sampler used to sample the number of clusters and associated cluster assignments. A higher value reduces sampling auto-correlation, but increases computational burden. Defaults to `w_star = 2`. The auxiliary Gibbs sampler is only used in the case that `!is.null(ksi)`, for predictor- assisted clustering; otherwise, the full conditionals for cluster assigment are conjugate.
`n.iter`	Total number of MCMC iterations.
`n.burn`	Number of MCMC iterations to discard. `gmrfdpgrow` will return `(n.iter - n.burn)` posterior samples.
`n.thin`	Gap between successive sampling iterations to save.
`nu`	The degree of freedom parameter for the Huang and Wand prior on precision `R x R` matrix locations, `Lambda_star`, in the case that `N x R` predictors, `ksi`, are entered to instantiate a predictor-dependent prior for co-clustering. Default value is `4`.
`Rep`	The number of times to draw samples of the `N x T` log-mean parameters, `Psi`, for each MCMC iteration under a Poisson-lognormal model when the response type for `y` is count (not continuous), which is indicated by inputing an offset matrix, `E`. Default value is `1`.
`progress`	A boolean value denoting whether to display a progress bar during model execution. Defaults to `progress = true`.
`jitter`	A scalar double indicating amount of jitter to subract from the posterior rate and shape hyperparameters of `tau_e` to stabilize computation. Defaults to `jitter = 0.0`.
`kappa_fast`	Boolean for whether to generate rate hyperparameter from full conditionals versus joint Gaussian (on random effects, `bb`, given `kappa`. The former is faster, but numerically less stable. Defaults to `kappa_fast = FALSE`.
`stable_launch`	A boolean indicator on whether to generate initial values for `N x T` log-mean, `Psi`, when `y` are count data from a Gaussian prior distribution, which can induce numerical launch instabilities, or whether to initialize as `y/E` (where missing values in `y` are imputed). Defaults to `stable_launch = TRUE`.

Value

S3 gmrfdpgrow object, for which many methods are available to return and view results. Generic functions applied to an object, res of class gmrfdpgrow, includes:

`plot(res)`	returns results plots, including fit functions versus data and allocation of fitted functions into clusters
`samples(res)`	contains (`n.iter - n.burn`) posterior sampling iterations for every model parameter
`resid(res)`	contains the model residuals.

Note

The intended focus for this package are data composed of observed noisy functions (each of length T) for a set of experimental units where the functions may express dependence among the experimental units

Author(s)

Terrance Savitsky tds151@gmail.com Daniell toth danielltoth@yahoo.com

References

T. D. Savitsky and D. Toth (2014) Bayesian Non-parametric Models for Collections of Time- indexed Functions. submitted to: JRSS Series A (Statistics in Society).

T. D. Savitsky (2014) Bayesian Non-parametric Functional Mixture Estimation for Time-indexed data. submitted to: Annals of Applied Statistics.

T. D. Savitsky (2014) Bayesian Non-Parametric Mixture Estimation for Time-Indexed Functional Data for R. Submitted to: Journal of Statistical Software.

Examples

{
library(growfunctions)

## load the monthly employment count data for a collection of 
## U.S. states from the Current 
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, associated 
## with the years 2008 - 2013
## to examine the state level employment levels 
## during the "great recession"
y_short   <- cps$y[,(cps$yr_label %in% c(2008:2013))]

## Run the DP mixture of iGMRF's to estimate posterior 
## distributions for model parameters
## Under default RW2(kappa) = order 2 trend 
## precision term
## Run for 1500 iterations, with half as burn-in for a
## more useful (converged) result.
res_gmrf            <- gmrfdpgrow(y = y_short, 
                                     n.iter = 40, 
                                     n.burn = 20, 
                                     n.thin = 1) 
                                     
## 2 plots of estimated functions: 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of randomly-selected functions
fit_plots_gmrf      <- cluster_plot( object = res_gmrf, 
                                     units_name = "state", 
                                     units_label = cps$st, 
                                     single_unit = FALSE, 
                                     credible = TRUE )    
}

[Package growfunctions version 0.16 Index]