mHMM {mHMMbayes} | R Documentation |
Multilevel hidden Markov model using Bayesian estimation
Description
mHMM
fits a multilevel (also known as mixed or random effects) hidden
Markov model (HMM) to intense longitudinal data with categorical, continuous
(i.e., normally distributed), or count (i.e., Poisson distributed)
observations of multiple subjects using Bayesian estimation, and creates an
object of class mHMM
. By using a multilevel framework, we allow for
heterogeneity in the model parameters between subjects, while estimating one
overall HMM. The function includes the possibility to add covariates at level
2 (i.e., at the subject level) and have varying observation lengths over
subjects. For a short description of the package see mHMMbayes. See
vignette("tutorial-mhmm")
for an introduction to multilevel hidden
Markov models and the package, and see vignette("estimation-mhmm")
for
an overview of the used estimation algorithms.
Usage
mHMM(
s_data,
data_distr = "categorical",
gen,
xx = NULL,
start_val,
mcmc,
return_path = FALSE,
show_progress = TRUE,
gamma_hyp_prior = NULL,
emiss_hyp_prior = NULL,
gamma_sampler = NULL,
emiss_sampler = NULL
)
Arguments
s_data |
A matrix containing the observations to be modeled, where the
rows represent the observations over time. In |
data_distr |
String vector with length 1 describing the
observation type of the data. Currently supported are |
gen |
List containing the following elements denoting the general model properties:
|
xx |
An optional list of (level 2) covariates to predict the transition
matrix and/or the emission probabilities. Level 2 covariate(s) means that
there is one observation per subject of each covariate. The first element
in the list If |
start_val |
List containing the start values for the transition
probability matrix gamma and the emission distribution(s). The first
element of the list contains a |
mcmc |
List of Markov chain Monte Carlo (MCMC) arguments, containing the following elements:
|
return_path |
A logical scalar. Should the sampled state sequence
obtained at each iteration and for each subject be returned by the function
( |
show_progress |
A logical scaler. Should the function show a text
progress bar in the |
gamma_hyp_prior |
An optional object of class |
emiss_hyp_prior |
An object of the class |
gamma_sampler |
An optional object of the class |
emiss_sampler |
An optional object of the class |
Details
Covariates specified in xx
can either be dichotomous or continuous
variables. Dichotomous variables have to be coded as 0/1 variables.
Categorical or factor variables can as yet not be used as predictor
covariates. The user can however break up the categorical variable in
multiple dummy variables (i.e., dichotomous variables), which can be used
simultaneously in the analysis. Continuous predictors are automatically
centered. That is, the mean value of the covariate is subtracted from all
values of the covariate such that the new mean equals zero. This is done such
that the presented probabilities in the output (i.e., for the population
transition probability matrix and population emission probabilities)
correspond to the predicted probabilities at the average value of the
covariate(s).
Value
mHMM
returns an object of class mHMM
, which has
print
and summary
methods to see the results.
The object contains always the following components:
PD_subj
A list containing one list per subject with the elements
trans_prob
,cat_emiss
,cont_emiss
, orcount_emiss
in case of categorical, continuous, and count observations, respectively, andlog_likl
, providing the subject parameter estimates over the iterations of the MCMC sampler.trans_prob
relates to the transition probabilities gamma,cat_emiss
to the categorical emission distribution (emission probabilities),cont_emiss
to the continuous emission distributions (subsequently the the emission means and the (fixed over subjects) emission standard deviation),count_emiss
to the count emission distribution means in the natural (real-positive) scale, andlog_likl
to the log likelihood over the MCMC iterations. Iterations are contained in the rows, the parameters in the columns.gamma_prob_bar
A matrix containing the group level parameter estimates of the transition probabilities over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows, and the columns contain the group level parameter estimates. If covariates were included in the analysis, the group level probabilities represent the predicted probability given that the covariate is at the average value for continuous covariates, or given that the covariate equals zero for dichotomous covariates.
gamma_int_bar
A matrix containing the group level intercepts of the Multinomial logistic regression modeling the transition probabilities over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows, and the columns contain the group level intercepts.
gamma_cov_bar
A matrix containing the group level regression coefficients of the Multinomial logistic regression predicting the transition probabilities over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows, and the columns contain the group level regression coefficients.
gamma_V_int_bar
A matrix containing the variance components for the subject-level intercepts (between subject variances) of the multinomial logistic regression modeling the transition probabilities over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows, and the columns contain the variance components for the subject level intercepts. Note that only the intercept variances (and not the co-variances) are returned.
gamma_int_subj
A list containing one matrix per subject denoting the subject level intercepts of the Multinomial logistic regression modeling the transition probabilities over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows, and the columns contain the subject level intercepts.
gamma_naccept
A matrix containing the number of accepted draws at the subject level RW Metropolis step for each set of parameters of the transition probabilities. The subjects are contained in the rows, and the columns contain the sets of parameters.
input
Overview of used input specifications: the distribution type of the observations
data_distr
, the number of statesm
, the number of used dependent variablesn_dep
, (in case of categorical observations) the number of output categories for each of the dependent variablesq_emiss
, the number of iterationsJ
and the specified burn in periodburn_in
of the hybrid Metropolis within Gibbs sampler, the number of subjectsn_subj
, the observation length for each subjectn_vary
, and the column names of the dependent variablesdep_labels
.sample_path
A list containing one matrix per subject with the sampled hidden state sequence over the hybrid Metropolis within Gibbs sampler. The time points of the dataset are contained in the rows, and the sampled paths over the iterations are contained in the columns. Only returned if
return_path = TRUE
.
Additionally, in case of categorical observations, the mHMM
return object
contains:
emiss_prob_bar
A list containing one matrix per dependent variable, denoting the group level emission probabilities of each dependent variable over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group level emission probabilities. If covariates were included in the analysis, the group level probabilities represent the predicted probability given that the covariate is at the average value for continuous covariates, or given that the covariate equals zero for dichotomous covariates.
emiss_int_bar
A list containing one matrix per dependent variable, denoting the group level intercepts of each dependent variable of the Multinomial logistic regression modeling the probabilities of the emission distribution over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group level intercepts.
emiss_cov_bar
A list containing one matrix per dependent variable, denoting the group level regression coefficients of the Multinomial logistic regression predicting the emission probabilities within each of the dependent variables over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group level regression coefficients.
emiss_V_int_bar
A list containing one matrix per dependent variable, denoting the variance components for the subject-level intercepts (between subject variances) of the multinomial logistic regression modeling the categorical emission probabilities over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows, and the columns contain the variance components for the subject level intercepts. Note that only the intercept variances (and not the co-variances) are returned.
emiss_int_subj
A list containing one list per subject denoting the subject level intercepts of each dependent variable of the Multinomial logistic regression modeling the probabilities of the emission distribution over the iterations of the hybrid Metropolis within Gibbs sampler. Each lower level list contains one matrix per dependent variable, in which iterations of the sampler are contained in the rows, and the columns contain the subject level intercepts.
emiss_naccept
A list containing one matrix per dependent variable with the number of accepted draws at the subject level RW Metropolis step for each set of parameters of the emission distribution. The subjects are contained in the rows, and the columns of the matrix contain the sets of parameters.
In case of continuous observations, the mHMM
return object contains:
emiss_mu_bar
A list containing one matrix per dependent variable, denoting the group level means of the Normal emission distribution of each dependent variable over the iterations of the Gibbs sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group level emission means. If covariates were included in the analysis, the group level means represent the predicted mean given that the covariate is at the average value for continuous covariates, or given that the covariate equals zero for dichotomous covariates.
emiss_varmu_bar
A list containing one matrix per dependent variable, denoting the variance between the subject level means of the Normal emission distributions. over the iterations of the Gibbs sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group level variance in the mean.
emiss_sd_bar
A list containing one matrix per dependent variable, denoting the (fixed over subjects) standard deviation of the Normal emission distributions over the iterations of the Gibbs sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group level emission variances.
emiss_cov_bar
A list containing one matrix per dependent variable, denoting the group level regression coefficients predicting the emission means within each of the dependent variables over the iterations of the Gibbs sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group level regression coefficients.
emiss_naccept
A list containing one matrix per dependent variable with the number of accepted draws at the subject level RW Metropolis step for each set of parameters of the emission distribution. The subjects are contained in the rows, and the columns of the matrix contain the sets of parameters.
input
Overview of used input specifications: the number of states
m
, the number of used dependent variablesn_dep
, the number of iterationsJ
and the specified burn in periodburn_in
of the hybrid Metropolis within Gibbs sampler, the number of subjectsn_subj
, the observation length for each subjectn_vary
, and the column names of the dependent variablesdep_labels
.
In case of count observations, the mHMM
return object contains:
emiss_mu_bar
A list containing one matrix per dependent variable, denoting the group-level means of the Poisson emission distribution of each dependent variable over the MCMC iterations. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group-level Poisson emission means in the positive scale. If covariates were included in the analysis, the group-level means represent the predicted mean counts given that the covariate is at the average value for continuous covariates, or given that the covariate equals zero for dichotomous covariates.
emiss_varmu_bar
A list containing one matrix per dependent variable, denoting the variance between the subject-level means of the Poisson emission distribution(s) over the iterations of the MCMC sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group-level variance(s) between subject-specific means (in the positive scale).
emiss_logmu_bar
A list containing one matrix per dependent variable, denoting the group-level logmeans of the Poisson emission distribution of each dependent variable over the MCMC iterations. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group-level Poisson emission logmeans in the logarithmic scale. If covariates were included in the analysis, the group-level means represent the predicted logmean counts given that the covariate is at the average value for continuous covariates, or given that the covariate equals zero for dichotomous covariates.
emiss_logvarmu_bar
A list containing one matrix per dependent variable, denoting the logvariance between the subject-level logmeans of the Poisson emission distribution(s) over the iterations of the MCMC sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group-level logvariance(s) between subject-specific means (in the logarithmic scale).
emiss_cov_bar
A list containing one matrix per dependent variable, denoting the group level regression coefficients predicting the emission means in the logarithmic scale within each of the dependent variables over the iterations of the MCMC sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group-level regression coefficients in the logarithmic scale.
input
Overview of used input specifications: the number of states
m
, the number of used dependent variablesn_dep
, the number of iterationsJ
and the specified burn in periodburn_in
of the hybrid Metropolis within Gibbs sampler, the number of subjectsn_subj
, the observation length for each subjectn_vary
, and the column names of the dependent variablesdep_labels
.
References
Rabiner LR (1989). “A tutorial on hidden Markov models and selected applications in speech recognition.” Proceedings of the IEEE, 77(2), 257–286.
Scott SL (2002). “Bayesian methods for hidden Markov models: Recursive computing in the 21st century.” Journal of the American Statistical Association, 97(457), 337–351.
Altman RM (2007). “Mixed hidden Markov models: an extension of the hidden Markov model to the longitudinal data setting.” Journal of the American Statistical Association, 102(477), 201–210.
Rossi PE, Allenby GM, McCulloch R (2012). Bayesian statistics and marketing. John Wiley & Sons.
Zucchini W, MacDonald IL, Langrock R (2017). Hidden Markov models for time series: an introduction using R. Chapman and Hall/CRC.
See Also
sim_mHMM
for simulating multilevel hidden Markov data,
vit_mHMM
for obtaining the most likely hidden state sequence
for each subject using the Viterbi algorithm, obtain_gamma
and obtain_emiss
for obtaining the transition or emission
distribution probabilities of a fitted model at the group or subject level,
and plot.mHMM
for plotting the posterior densities of a
fitted model.
Examples
###### Example on package (categorical) example data, see ?nonverbal
# specifying general model properties:
m <- 2
n_dep <- 4
q_emiss <- c(3, 2, 3, 2)
# specifying starting values
start_TM <- diag(.8, m)
start_TM[lower.tri(start_TM) | upper.tri(start_TM)] <- .2
start_EM <- list(matrix(c(0.05, 0.90, 0.05,
0.90, 0.05, 0.05), byrow = TRUE,
nrow = m, ncol = q_emiss[1]), # vocalizing patient
matrix(c(0.1, 0.9,
0.1, 0.9), byrow = TRUE, nrow = m,
ncol = q_emiss[2]), # looking patient
matrix(c(0.90, 0.05, 0.05,
0.05, 0.90, 0.05), byrow = TRUE,
nrow = m, ncol = q_emiss[3]), # vocalizing therapist
matrix(c(0.1, 0.9,
0.1, 0.9), byrow = TRUE, nrow = m,
ncol = q_emiss[4])) # looking therapist
# Run a model without covariate(s):
# Note that for reasons of running time, J is set at a ridiculous low value.
# One would typically use a number of iterations J of at least 1000,
# and a burn_in of 200.
out_2st <- mHMM(s_data = nonverbal,
gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss),
start_val = c(list(start_TM), start_EM),
mcmc = list(J = 11, burn_in = 5))
out_2st
summary(out_2st)
# plot the posterior densities for the transition and emission probabilities
plot(out_2st, component = "gamma", col =c("darkslategray3", "goldenrod"))
# Run a model including a covariate (see ?nonverbal_cov) to predict the
# emission distribution for each of the 4 dependent variables:
n_subj <- 10
xx_emiss <- rep(list(matrix(c(rep(1, n_subj),nonverbal_cov$std_CDI_change),
ncol = 2, nrow = n_subj)), n_dep)
xx <- c(list(matrix(1, ncol = 1, nrow = n_subj)), xx_emiss)
out_2st_c <- mHMM(s_data = nonverbal, xx = xx,
gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss),
start_val = c(list(start_TM), start_EM),
mcmc = list(J = 11, burn_in = 5))
###### Example on categorical simulated data
# Simulate data for 10 subjects with each 100 observations:
n_t <- 100
n <- 10
m <- 2
n_dep <- 1
q_emiss <- 3
gamma <- matrix(c(0.8, 0.2,
0.3, 0.7), ncol = m, byrow = TRUE)
emiss_distr <- list(matrix(c(0.5, 0.5, 0.0,
0.1, 0.1, 0.8), nrow = m, ncol = q_emiss, byrow = TRUE))
data1 <- sim_mHMM(n_t = n_t, n = n, gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss),
gamma = gamma, emiss_distr = emiss_distr, var_gamma = .5, var_emiss = .5)
# Specify remaining required analysis input (for the example, we use simulation
# input as starting values):
n_dep <- 1
q_emiss <- 3
# Run the model on the simulated data:
out_2st_sim <- mHMM(s_data = data1$obs,
gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss),
start_val = c(list(gamma), emiss_distr),
mcmc = list(J = 11, burn_in = 5))
###### Example on continuous simulated data
# simulating multivariate continuous data
n_t <- 100
n <- 10
m <- 3
n_dep <- 2
gamma <- matrix(c(0.8, 0.1, 0.1,
0.2, 0.7, 0.1,
0.2, 0.2, 0.6), ncol = m, byrow = TRUE)
emiss_distr <- list(matrix(c( 50, 10,
100, 10,
150, 10), nrow = m, byrow = TRUE),
matrix(c(5, 2,
10, 5,
20, 3), nrow = m, byrow = TRUE))
data_cont <- sim_mHMM(n_t = n_t, n = n, data_distr = 'continuous',
gen = list(m = m, n_dep = n_dep),
gamma = gamma, emiss_distr = emiss_distr,
var_gamma = .1, var_emiss = c(5^2, 0.2^2))
# Specify hyper-prior for the continuous emission distribution
manual_prior_emiss <- prior_emiss_cont(
gen = list(m = m, n_dep = n_dep),
emiss_mu0 = list(matrix(c(30, 70, 170), nrow = 1),
matrix(c(7, 8, 18), nrow = 1)),
emiss_K0 = list(1, 1),
emiss_V = list(rep(5^2, m), rep(0.5^2, m)),
emiss_nu = list(1, 1),
emiss_a0 = list(rep(1.5, m), rep(1, m)),
emiss_b0 = list(rep(20, m), rep(4, m)))
# Run the model on the simulated data:
# Note that for reasons of running time, J is set at a ridiculous low value.
# One would typically use a number of iterations J of at least 1000,
# and a burn_in of 200.
out_3st_cont_sim <- mHMM(s_data = data_cont$obs,
data_distr = 'continuous',
gen = list(m = m, n_dep = n_dep),
start_val = c(list(gamma), emiss_distr),
emiss_hyp_prior = manual_prior_emiss,
mcmc = list(J = 11, burn_in = 5))
###### Example on multivariate count data
# Simulate data with one covariate for each count dependent variable
n_t <- 200 # Number of observations on the dependent variable
m <- 3 # Number of hidden states
n_dep <- 2 # Number of dependent variables
n_subj <- 30 # Number of subjects
gamma <- matrix(c(0.9, 0.05, 0.05,
0.2, 0.7, 0.1,
0.2,0.3, 0.5), ncol = m, byrow = TRUE)
emiss_distr <- list(matrix(c(20,
10,
5), nrow = m, byrow = TRUE),
matrix(c(50,
3,
20), nrow = m, byrow = TRUE))
# Define between subject variance to use on the simulating function:
# here, the variance is varied over states within the dependent variable.
var_emiss <- list(matrix(c(5.0, 3.0, 1.5), nrow = m),
matrix(c(5.0, 5.0, 5.0), nrow = m))
# Simulate count data:
data_count <- sim_mHMM(n_t = n_t,
n = n_subj,
data_distr = "count",
gen = list(m = m, n_dep = n_dep),
gamma = gamma,
emiss_distr = emiss_distr,
var_gamma = 0.1,
var_emiss = var_emiss,
return_ind_par = TRUE)
# Transition probabilities
start_gamma <- diag(0.8, m)
start_gamma[lower.tri(start_gamma) | upper.tri(start_gamma)] <-
(1 - diag(start_gamma)) / (m - 1)
# Emission distribution
start_emiss <- list(matrix(c(20,10, 5), nrow = m, byrow = TRUE),
matrix(c(50, 3,20), nrow = m, byrow = TRUE))
# Specify hyper-prior for the count emission distribution
manual_prior_emiss <- prior_emiss_count(
gen = list(m = m, n_dep = n_dep),
emiss_mu0 = list(matrix(c(20, 10, 5), byrow = TRUE, ncol = m),
matrix(c(50, 3, 20), byrow = TRUE, ncol = m)),
emiss_K0 = rep(list(0.1),n_dep),
emiss_nu = rep(list(0.1),n_dep),
emiss_V = rep(list(rep(10, m)),n_dep)
)
# Run model
# Note that for reasons of running time, J is set at a ridiculous low value.
# One would typically use a number of iterations J of at least 1000,
# and a burn_in of 200.
out_3st_count_sim <- mHMM(s_data = data_count$obs,
data_distr = 'count',
gen = list(m = m, n_dep = n_dep),
start_val = c(list(start_gamma), start_emiss),
emiss_hyp_prior = manual_prior_emiss,
mcmc = list(J = 11, burn_in = 5),
show_progress = TRUE)
summary(out_3st_count_sim)