R: Specifying informative hyper-priors on the count emission...

prior_emiss_count {mHMMbayes}

R Documentation

Specifying informative hyper-priors on the count emission distribution(s) of the multilevel hidden Markov model

Description

prior_emiss_count provides a framework to manually specify an informative hyper-prior on the count emission distributions. prior_emiss_count creates an object of class mHMM_prior_emiss used by the function mHMM, and additionally attaches the class count to signal use for count observations. The set of hyper-prior distributions consists of a lognormal-Inverse-Gamma distribution (i.e., assuming both unknown population mean and variance between subject level means) on the vector of Poisson means (i.e., intercepts and regression coefficients).

Usage

prior_emiss_count(
  gen,
  emiss_mu0,
  emiss_K0,
  emiss_V,
  emiss_nu,
  n_xx_emiss = NULL,
  log_scale = FALSE
)

Arguments

`gen`	List containing the following elements denoting the general model properties: `m`: numeric vector with length 1 denoting the number of hidden states `n_dep`: numeric vector with length 1 denoting the number of dependent variables `q_emiss`: only to be specified if the data represents categorical data. Numeric vector with length `n_dep` denoting the number of observed categories for the categorical emission distribution for each of the dependent variables.
`emiss_mu0`	A list containing `n_dep` matrices, i.e., one list for each dependent variable `k`. Each matrix contains the hypothesized hyper-prior means of the Poisson emission distribution in each of the states in the natural (positive real numbers) scale. Hence, each matrix consists of one row (when not including covariates in the model) and `m` columns. If covariates are used, the number of rows in each matrix in the list is equal to 1 + n_xx_emiss (i.e., the first row corresponds to the hyper-prior means, the subsequent rows correspond to the hyper-prior values of the regression coefficients connected to each of the covariates). If covariates are used to predict the emission distribution, then `emiss_mu0` should be specified in the logarithmic scale, and `log_scale` set to `TRUE`.
`emiss_K0`	A list containing `n_dep` elements corresponding to each dependent variable `k`. Each element `k` is a numeric vector with length 1 (when no covariates are used) denoting the number of hypothetical prior subjects on which the set of hyper-prior means specified in `emiss_mu0` are based. When covariates are used: each element is a numeric vector with length 1 + n_xx denoting the number of hypothetical prior subjects on which the set of means (first value) and set of regression coefficients (subsequent values) are based.
`emiss_V`	A list containing `n_dep` elements corresponding to each of the dependent variables `k`, where each element `k` is a vector with length `m` containing the hypothesized variance between the subject (emission distribution) means in the natural (positive real numbers) scale, which are assumed to follow a Inverse Gamma hyper-prior distribution (note: here, the Inverse Gamma hyper-prior distribution is parametrized as a scaled inverse chi-squared distribution). If covariates are used to predict the emission distribution, then `emiss_V` should be specified in the logarithmic scale, and `log_scale` set to `TRUE`.
`emiss_nu`	A list containing `n_dep` elements corresponding to each dependent variable `k`. Each element `k` is a numeric vector with length 1 denoting the degrees of freedom of the Inverse Gamma hyper-prior distribution on the between subject variance of the emission distribution means (note: here, the Inverse Gamma hyper-prior distribution is parametrized as a scaled inverse chi-squared distribution).
`n_xx_emiss`	Optional numeric vector with length `n_dep` denoting the number of (level 2) covariates used to predict the emission distribution of each of the dependent variables `k`. When omitted, the model assumes no covariates are used to predict the emission distribution(s).
`log_scale`	A logical scalar. Should `emiss_mu0` and `emiss_V` be specified in the logarithmic scale (`log_scale = TRUE`) or the natural scale (`log_scale = FALSE`). The default equals `log_scale = FALSE`.

Details

Estimation of the mHMM proceeds within a Bayesian context, hence a hyper-prior distribution has to be defined for the group level parameters. To avoid problems with 'label switching' when dealing with continuous emission distribution(s) (i.e., switching of the labels of the hidden states while sampling from the MCMC), the user is forced to specify hyper-prior parameter values when using count emission distributions (i.e., default, non-informative priors are not available for count emission distributions).

Note that emiss_K0 and emiss_nu are assumed equal over the states. Also note that in case covariates are specified, the hyper-prior parameter values of the inverse Wishart distribution on the covariance matrix remain unchanged, as the estimates of the regression coefficients for the covariates are fixed over subjects.

Also note that for simplicity the hyper-prior means and variances of the lognormal distribution, emiss_mu0 and emiss_V, by default have to be specified in the natural (positive real numbers) scale and not in the logarithmic scale. prior_emiss_count returns the corresponding values of the parameters on the logarithmic scale. If the user wants to manually specify these values on the logarithmic scale, please set the argument log_scale to TRUE in prior_emiss_count. If covariates are used to predict the emission distribution, then the logarithmic scale should be used for the inputs emiss_mu0 and emiss_V, and set log_scale = TRUE. To aid the user in transforming the variance to the logarithmic scale, the function 'var_to_logvar()' can be used.

Value

prior_emiss_count returns an object of class mHMM_prior_emiss, containing informative hyper-prior values for the continuous emission distribution(s) of the multilevel hidden Markov model. The object is specifically created and formatted for use by the function mHMM, and thoroughly checked for correct input dimensions. The object contains the following components:

gen: A list containing the elements m, and n_dep, used for checking equivalent general model properties specified under prior_emiss_count and mHMM.
emiss_mu0: A lists containing the hypothesized hyper-prior means of the the Poisson distribution used to model the count emissions.
emiss_K0: A list containing n_dep elements denoting the number of hypothetical prior subjects on which the set of hyper-prior means specified in emiss_mu0 are based.
emiss_V: A list containing n_dep elements containing the variance of the Inverse Gamma hyper-prior distribution on the between subject variance of the emission distribution means.
emiss_nu: A list containing n_dep elements denoting the degrees of freedom of the Inverse Gamma hyper-prior distribution on the between subject variance of the emission distribution means.
n_xx_emiss: A numeric vector denoting the number of (level 2) covariates used to predict the emission distribution of each of the dependent variables. When no covariates are used, n_xx_emiss equals NULL.

Examples

###### Example using simulated data
# specifying general model properties:
m <- 3
n_dep <- 2

# hypothesized hyper-prior values for the count emission distribution
manual_prior_emiss <- prior_emiss_count(
                        gen = list(m = m, n_dep = n_dep),
                        emiss_mu0 = list(matrix(c(30, 70, 170), nrow = 1),
                                         matrix(c(7, 8, 18), nrow = 1)),
                        emiss_K0 = list(1, 1),
                        emiss_V =  list(rep(16, m), rep(4, m)),
                        emiss_nu = list(0.1, 0.1))

# to use the informative priors in a model, simulate multivariate count data
n_t     <- 100
n       <- 10

# Specify group-level transition and emission means
gamma   <- matrix(c(0.8, 0.1, 0.1,
                    0.2, 0.7, 0.1,
                    0.2, 0.2, 0.6), ncol = m, byrow = TRUE)
emiss_distr <- list(matrix(log(c( 50,
                              100,
                              150)), nrow = m, byrow = TRUE),
                    matrix(log(c(5,
                             10,
                             20)), nrow = m, byrow = TRUE))
# Simulate data
data_count <- sim_mHMM(n_t = n_t, n = n, data_distr = 'count',
                       gen = list(m = m, n_dep = n_dep),
                       gamma = gamma, emiss_distr = emiss_distr,
                       var_gamma = .1, var_emiss = c(.05, 0.01), log_scale = TRUE)

# Specify starting values
start_gamma <- gamma
start_emiss <- list(matrix(c(50,
                             100,
                             150), nrow = m, byrow = TRUE),
                    matrix(c(5,
                             10,
                             20), nrow = m, byrow = TRUE))


# using the informative hyper-prior in a model
# Note that for reasons of running time, J is set at a ridiculous low value.
# One would typically use a number of iterations J of at least 1000,
# and a burn_in of 200.
out_3st_count_sim_infemiss <- mHMM(s_data = data_count$obs,
                    data_distr = "count",
                    gen = list(m = m, n_dep = n_dep),
                    start_val = c(list(start_gamma), start_emiss),
                    emiss_hyp_prior = manual_prior_emiss,
                    mcmc = list(J = 11, burn_in = 5))

out_3st_count_sim_infemiss
summary(out_3st_count_sim_infemiss)

[Package mHMMbayes version 1.1.0 Index]