hopit {hopit} | R Documentation |
Generalized hierarchical ordered threshold models.
Description
The ordered response data classify a measure of interest into ordered categories
collected during a survey. For example, if the dependent variable is a happiness
rating, a respondent typically answers a question such as: “Taking all things
together, would you say you are ... ?" and then selects from response options
along the lines of: "very happy", "pretty happy", "not too happy", and "very unhappy"
(Liao et al. 2005). Similarly, if interviewees are asked to evaluate their
health in general (e.g., “Would you say your health is ... ?”) they, can typically choose among
several categories, such as "very good", "good", "fair", "bad", and "very bad"
(King et al. 2004; Jurges 2007; Rebelo and Pereira 2014; Oksuzyan et al. 2019). In political science, a respondent
may be asked for an opinion about recent legislation (e.g. “Rate your feelings about
the proposed legislation.") and asked to choose among categories like: "strongly
oppose", "mildly oppose", "indifferent", "mildly support", and "strongly support"
(Greene and Hensher 2010). It is easy to imagine other multi-level ordinal
variables that might be used during a survey and to which the methodology described
below could be applied.
In practice, it is assumed that when responding to a survey question about their general
happiness, health, feelings, attitudes or other status, participants are
assessing their true value of this unobserved continuous variable, and
project it onto the discrete scale provided. The thresholds that individuals
use to categorize their true status by selecting a specific response option
may be affected by the reference group chosen, their earlier life experiences,
and cross-cultural differences in using scales. Thus, the responses of
individuals may differ depending on their gender, age, cultural background,
education, and personality traits; among other factors
(King et al. 2004; Jurges 2007; Oksuzyan et al. 2019).
From the perspective of reporting behavior modeling, one of the main tasks
researchers face is to compute this continuous estimate of the underlying,
latent measures of individuals based on several specific characteristics
of the responses considered (e.g., health variables or happiness variables),
and to account for variations in reporting across socio-demographic and
cultural groups. More specifically, to build a latent, underlying measure,
a generalized hierarchical ordered threshold model is fitted that regresses
the reported status/attitude/feeling on two sets of independent variables
(Boes and Winkelmann 2006; Greene et al. 2014). When the dependent reported ordered
variable is self-rated health status, then the first set of variables –
i.e., health variables – assess specific aspects of individuals’ health,
such as measures of chronic conditions, mobility, difficulties with a range
of daily activities, grip strength, anthropometric characteristics, and
lifestyle behaviors. Using the second set of independent variables
(threshold variables), the model also adjusts for differences across
socio-demographic and cultural groups, such as differences in cultural
background, gender, age, and education
(King et al. 2004; Jurges 2007; Oksuzyan et al. 2019).
Ordered threshold models are used to fit ordered categorical dependent variables. The generalized ordered threshold models (Ierza 1985; Boes and Winkelmann 2006; Greene et al. 2014) are an extension of the ordered threshold models (McKelvey and Zavoina 1975). Whereas in the latter models, the thresholds are constant, in the generalized models the thresholds are allowed to be dependent on covariates. Greene and Hensher (2010); Greene et al. (2014) pointed out that for a model to make sense, the thresholds must also be ordered. This observation motivated Greene and coauthors to call these models HOPIT, which stands for hierarchical ordered probit models.
The fitted hopit model is used to analyze heterogeneity in reporting behavior.
See standardizeCoef
, latentIndex
,
getCutPoints
, getLevels
, and boot_hopit
.
Usage
hopit(
latent.formula,
thresh.formula = ~1,
data,
decreasing.levels,
start = NULL,
fit.sigma = FALSE,
design = list(),
weights = NULL,
link = c("probit", "logit"),
control = list(),
na.action = na.fail
)
Arguments
latent.formula |
a formula used to model the latent variable. It should not contain any threshold variable. To specify the interactions between the latent and the threshold variables, see details. |
thresh.formula |
a formula used to model the threshold variable. It should not contain any latent variable. To specify interactions between the latent and the threshold variables, see details. Any dependent variable (left side of "~" in the formula) will be ignored. |
data |
a data frame that includes all modeled variables. |
decreasing.levels |
a logical indicating whether self-reported health classes are ordered in decreasing order. |
start |
a vector with starting coefficient values in the form |
fit.sigma |
a logical indicating whether to fit an additional parameter sigma, which models a standard deviation of the error term (e.g., the standard deviation of the cumulative normal distribution in the probit model). |
design |
an optional survey design. Use the |
weights |
optional model weights. Use design parameter to construct survey weights. |
link |
a link function. The possible values are |
control |
a list with control parameters. See |
na.action |
a function that indicates what should happen when the |
Details
The function fits generalized hierarchical ordered threshold models.
latent.formula
models the latent variable.
If the response variable is self-rated health, then the latent measure can depend on different health
conditions and diseases (latent variables are called health variables).
Latent variables are modeled with the parallel regression assumption. According to this assumption, the coefficients
that describe the relationship between the lowest response category and all of the higher response categories, are the same as the coefficients
that describe the relationship between another (e.g., adjacent) lowest response category and the remaining higher response categories.
The predicted latent variable is modeled as a linear function of the health variables and the corresponding coefficients.
thresh.formula
models the threshold variable.
The thresholds (cut-points, alpha
) are modeled by the threshold variables gamma
and the intercepts lambda
.
It is assumed that they model the contextual characteristics of the respondent (e.g., country, gender, and age).
The threshold variables are modeled without the parallel regression assumption; thus, each threshold is modeled by
a variable independently (Boes and Winkelmann 2006; Greene et al. 2014).
The hopit
() function uses the parameterization of thresholds proposed by Jurges (2007).
decreasing.levels
it is the logical that determines the ordering of the levels of the categorical response variable.
It is always advisable to first check the ordering of the levels before starting (see example 1)
It is possible to model the interactions, including interactions between the latent and the threshold variables. The interactions added to the latent formula
only model the latent measure, and the interactions modeled in the threshold formula only model the thresholds.
The general rule for modeling any kind of interaction is to use "*" to specify interactions within a latent (or threshold) formula and to
use ':' to specify interactions between the latent and the threshold variables. In the latter case, the main effects of an interaction must also be specified;
i.e., the main latent effects must be specified in the latent formula, and the main threshold effect must be speciffied in the threshold formula.
See also Example 3
below.
For more details, please see the package vignette, which is also available under this link: vig_hopit.pdf
Value
a hopit
object used by other functions and methods. The object is a list with the following components:
control |
a list with control parameters. See |
link |
a link function used. |
hasdisp |
a logical indicating whether fit.sigma was modeled. |
use.weights |
a logical indicating whether any weights were used. |
weights |
a vector with model weights. |
frame |
a model frame. |
latent.formula |
a latent formula used to fit the model. |
latent.mm |
a latent model matrix. |
latent.terms |
latent variables used, and their interactions. |
cross.inter.latent |
a part of the latent formula used for modeling cross-interactions in the latent model |
thresh.formula |
a threshold formula used to fit the model. |
thresh.mm |
a threshold model matrix. |
thresh.extd |
an extended threshold model matrix. |
thresh.terms |
threshold variables used, and their interactions. |
cross.inter.thresh |
a part of the threshold formula used for modeling cross-interactions in the threshold model |
thresh.no.cov |
a logical indicating whether gamma parameters are present. |
parcount |
a 3-element vector with a number of parameters for the latent variables (beta), the threshold intercepts (lambda), and the threshold covariates (gamma). |
coef |
a vector with model coefficients. |
coef.ls |
model coefficients as a list. |
start |
a vector with the starting values of the coefficients. |
alpha |
estimated individual-specific thresholds. |
y_i |
a vector with individual responses - the response variable. |
y_latent_i |
a vector with predicted latent measures for each individual. |
Ey_i |
a vector with predicted categorical responses for each individual. |
J |
a number of response levels. |
N |
a number of observations. |
deviance |
a deviance. |
LL |
a log likelihood. |
AIC |
an AIC for models without a survey design. |
vcov |
a variance-covariance matrix. |
vcov.basic |
a variance-covariance matrix that ignores the survey design. |
hessian |
a Hessian matrix. |
estfun |
a gradient (a vector of partial derivatives) of the log likelihood function at the estimated coefficient values. |
YYY1 , YYY2 , YYY3 |
an internal objects used for the calculation of gradient and Hessian functions. |
Author(s)
Maciej J. Danko
References
Boes S, Winkelmann R (2006).
“Ordered response models.”
Allgemeines Statistisches Archiv, 90(1), 167–181.
ISSN 1614-0176, doi:10.1007/s10182-006-0228-y.
Greene W, Harris MN, Hollingsworth B, Weterings TA (2014).
“Heterogeneity in Ordered Choice Models: A Review with Applications to Self-Assessed Health.”
Journal of Economic Surveys, 28(1), 109-133.
doi:10.1111/joes.12002.
Greene W, Hensher D (2010).
Modeling Ordered Choices.
Cambridge University Press.
Ierza JV (1985).
“Ordinal probit: A generalization.”
Communications in Statistics - Theory and Methods, 14(1), 1-11.
ISSN 0361-0926, doi:10.1080/03610928508828893.
Jurges H (2007).
“True health vs response styles: exploring cross-country differences in self-reported health.”
Health Economics, 16(2), 163-178.
doi:10.1002/hec.1134.
King G, Murray CJL, Salomon JA, Tandon A (2004).
“Enhancing the Validity and Cross-Cultural Comparability of Measurement in Survey Research.”
American Political Science Review, 98(1), 191–207.
doi:10.1017/S000305540400108X.
Liao P, Fu Y, Yi C (2005).
“Perceived quality of life in Taiwan and Hong Kong: an intra-culture comparison.”
Journal of Happiness Studies, 6(1), 43–67.
ISSN 1573-7780, doi:10.1007/s10902-004-1753-6.
McKelvey RD, Zavoina W (1975).
“A Statistical Model for the Analysis of Ordinal Level Dependent Variables.”
Journal of Mathematical Sociology, 4(1), 103–120.
Oksuzyan A, Danko MJ, Caputo J, Jasilionis D, Shkolnikov VM (2019).
“Is the story about sensitive women and stoical men true? Gender differences in health after adjustment for reporting behavior.”
Social Science & Medicine, 228, 41-50.
doi:10.1016/j.socscimed.2019.03.002.
Rebelo LP, Pereira NS (2014).
“Assessing health endowment, access and choice determinants: Impact on retired Europeans' (in)activity and quality of life.”
Social Indicators Research, 119(3), 1411-1446.
doi:10.1007/s11205-013-0542-1.
See Also
coef.hopit
,
profile.hopit
,
hopit.control
,
anova.hopit
,
vcov.hopit
,
logLik.hopit
,
AIC.hopit
,
summary.hopit
,
svydesign
,
For heterogeneity in reporting behavior analysis see:
standardizeCoef
,
latentIndex
,
getCutPoints
,
getLevels
,
boot_hopit
,
Examples
# DATA
data(healthsurvey)
# first determine the order of the levels of the dependent variable
levels(healthsurvey$health)
# the order of response levels decreases from the best health to
# the worst health; hence the hopit() parameter decreasing.levels
# is set to TRUE
# Example 1 ---------------------
# fitting the model:
model1 <- hopit(latent.formula = health ~ hypertension + high_cholesterol +
heart_attack_or_stroke + poor_mobility + very_poor_grip +
depression + respiratory_problems +
IADL_problems + obese + diabetes + other_diseases,
thresh.formula = ~ sex + ageclass + country,
decreasing.levels = TRUE,
control = list(trace = FALSE),
data = healthsurvey)
# summarize the fit:
summary(model1)
# extract parameters in the form of a list
cm1 <- coef(model1, aslist = TRUE)
# names of the returned coefficients
names(cm1)
# extract the latent health coefficients
cm1$latent.params
# check the fit
profile(model1)
# Example 2 ---------------------
# incorporate the survey design
design <- svydesign(ids = ~ country + psu, weights = healthsurvey$csw,
data = healthsurvey)
model2 <- hopit(latent.formula = health ~ hypertension + high_cholesterol +
heart_attack_or_stroke + poor_mobility +
very_poor_grip + depression + respiratory_problems +
IADL_problems + obese + diabetes + other_diseases,
thresh.formula = ~ sex + ageclass + country,
decreasing.levels = TRUE,
design = design,
control = list(trace = FALSE),
data = healthsurvey)
# compare the latent variables
cbind('No survey design' = coef(model1, aslist = TRUE)$latent.par,
'Has survey design' = coef(model2, aslist = TRUE)$latent.par)
# Example 3 ---------------------
# defining the interactions between the threshold and the latent variables
# correctly defined interactions:
model3 <- hopit(latent.formula = health ~ hypertension + high_cholesterol +
heart_attack_or_stroke + poor_mobility * very_poor_grip +
depression + respiratory_problems +
IADL_problems + obese + diabetes + other_diseases +
sex : depression + sex : diabetes + ageclass:obese,
thresh.formula = ~ sex * ageclass + country + sex : obese,
decreasing.levels = TRUE,
control = list(trace = FALSE),
data = healthsurvey)
## Not run:
# badly defined interactions:
# 1) lack of a main effect of "other_diseases" in any formula
# it can be solved by adding " + other_diseases" to the latent formula
model3a <- hopit(latent.formula = health ~ hypertension + high_cholesterol +
heart_attack_or_stroke + poor_mobility + very_poor_grip +
depression + respiratory_problems +
IADL_problems + obese + diabetes + other_diseases : sex,
thresh.formula = ~ sex + ageclass + country,
decreasing.levels = TRUE,
control = list(trace = FALSE),
data = healthsurvey)
# 2) the main effect of sex is present in both formulas.
# it can be solved by replacing "*" with ":" in "other_diseases * sex"
model3b <- hopit(latent.formula = health ~ hypertension + high_cholesterol +
heart_attack_or_stroke + poor_mobility + very_poor_grip +
depression + respiratory_problems +
IADL_problems + obese + diabetes + other_diseases * sex,
thresh.formula = ~ sex + ageclass + country,
decreasing.levels = TRUE,
control = list(trace = FALSE),
data = healthsurvey)
## End(Not run)
# Example 4 ---------------------
# construct a naive continuous variable:
hs <- healthsurvey
hs$cont_var <- sample(5000:5020,nrow(hs),replace=TRUE)
latent.formula = health ~ hypertension + high_cholesterol +
heart_attack_or_stroke + poor_mobility + very_poor_grip +
depression + respiratory_problems +
IADL_problems + obese + diabetes + other_diseases
# in some cases, when continuous variables are used, the hopit:::get.hopit.start() function
# do not find starting parameters (R version 3.4.4 (2018-03-15)):
## Not run:
model4 <- hopit(latent.formula = latent.formula,
thresh.formula = ~ sex + cont_var,
decreasing.levels = TRUE,
data = hs)
## End(Not run)
# one of the solutions is to transform one or more continuous variables:
hs$cont_var_t <- hs$cont_var-min(hs$cont_var)
model4b <- hopit(latent.formula = latent.formula,
thresh.formula = ~ sex + cont_var_t,
decreasing.levels = TRUE,
data = hs)
# this can also be done automatically using the the control parameter
model4c <- hopit(latent.formula = latent.formula,
thresh.formula = ~ sex + cont_var,
decreasing.levels = TRUE,
control = list(transform.thresh = 'min',
transform.latent = 'none'),
data = hs)
model4d <- hopit(latent.formula = latent.formula,
thresh.formula = ~ sex + cont_var,
decreasing.levels = TRUE,
control = list(transform.thresh = 'scale_01',
transform.latent = 'none'),
data = hs)
model4e <- hopit(latent.formula = latent.formula,
thresh.formula = ~ sex + cont_var,
decreasing.levels = TRUE,
control = list(transform.thresh = 'standardize',
transform.latent = 'none'),
data = hs)
model4f <- hopit(latent.formula = latent.formula,
thresh.formula = ~ sex + cont_var,
decreasing.levels = TRUE,
control = list(transform.thresh = 'standardize_trunc',
transform.latent = 'none'),
data = hs)
round(t(rbind(coef(model4b),
coef(model4c),
coef(model4d),
coef(model4e),
coef(model4f))),4)