tpm {twopartm}R Documentation

Fit Two-part Regression Models for Zero-inflated Data

Description

Fit two-part regression models for zero-inflated data. The first-model is a binomial regression model for indicators about any non-zero responses. The second-model is a generalized linear regression model for non-zero response values.

Usage

tpm(formula_part1, formula_part2 = NULL,data, link_part1 = c("logit",
"probit", "cloglog", "cauchit", "log"), family_part2 = gaussian(), weights = NULL, ...)

## S4 method for signature 'twopartm'
summary(object,...)

Arguments

formula_part1

formula specifying the dependent variable and the regressors used for the first-part model, i.e., the binomial model for probabilities of non-zero responses. If formula_part2 is NULL, the same regressors specified here are employed in both parts.

formula_part2

formula specifying the dependent variable and the regressors used for the second-part model, i.e., the glm model for non-zero responses. If it's NULL, the same regressors specified in formula_part1 are employed in the second-part model.

data

a data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the models for both parts.

link_part1

character string specifying the link function of the first-part model, i.e., the binomial model for probabilities of non-zero responses. It could be logit, probit, cauchit, (corresponding to logistic, normal and Cauchy CDFs respectively) log or cloglog (complementary log-log).

family_part2

a description of the error distribution and link function to be used in the second-part model, i.e., the glm model for non-zero responses. This can be a character string naming a family function, a family function or the result of a call to a family function.

weights

an optional numeric vector of weights to be used in the fitting process for both parts. Should be NULL or a numeric vector.

object

a fitted two-part model object of class twopartm as returned by tpm.

...

arguments passed to glm or summary.glm in the default setup.

Details

Two-part models are two-component models for zero-inflated data, one modeling indicators about any non-zero responses and another modeling non-zero response values. It models the zeros and non-zeros as two separate processes. For instance, in explaining individual annual health expenditure, the event is represented by a specific disease. If the illness occurs, then some not-for-free treatment will be needed, and a positive expense will be observed. In these situations, a two-part model allows the censoring mechanism and the outcome to be modeled to use separate processes. In other words, it permits the zeros and nonzeros to be generated by different densities as a special type of mixture model.

In function tpm, the zeros are handled using the first-model, specifically a glm with binomial family and specified link function for the probability of a non-zero outcome. The second-model is a glm with specified family function with link for non-zero values. The regressors for both parts could be different and specified separately. The two components of the model are estimated separately using glm calls, with iterated reweighted least-squares (IRLS) optimization.

The returned fitted model object is of class twopartm.A set of standard extractor functions for fitted model objects is available for objects of class twopartm, including methods to the generic functions print, summary, plot, coef, logLik, residuals, and predict.See predict-methods for more details on prediction method.

The summary method lists result summaries of two fitted glm models for each part respectively.

Value

tpm returns an object of class twopartm.

summary returns a list with two objects of class summary.glm for first-part model and second-part model respectively.

Author(s)

Yajie Duan, Birol Emir, Griffith Bell and Javier Cabrera

References

Belotti, F., Deb, P., Manning, W.G. and Norton, E.C. (2015). twopm: Two-part models. The Stata Journal, 15(1), pp.3-20.

Hay, J. W., and R. J. Olsen. (1984). Let them eat cake: A note on comparing alternative models of the demand for medical care. Journal of Business and Economic Statistics 2: 279–282.

Leung, S. F., and S. Yu. (1996). On the choice between sample selection and two-part models. Journal of Econometrics 72: 197–229

Mihaylova, B., A. Briggs, A. O’Hagan, and S. G. Thompson. (2011). Review of statistical methods for analyzing healthcare resources and costs. Health Economics 20: 897–916.

See Also

twopartm-class, glm, summary.glm, predict-methods

Examples


##data about health expenditures, i.e., non-negative continuous response
data(meps,package = "twopartm")


##fit two-part model with the same regressors in both parts, with logistic
##regression model for the first part, and glm with Gamma family with log
##link for the second-part model
tpmodel = tpm(exp_tot~female+age, data = meps,link_part1 = "logit",
family_part2 = Gamma(link = "log"))

tpmodel

summary(tpmodel)


##fit two-part model with different regressors in both parts, with probit
##regression model for the first part, and glm with Gamma family with log
##link for the second-part model
tpmodel = tpm(formula_part1 = exp_tot~female+age, formula_part2 =
exp_tot~female+age+ed_colplus,data = meps,link_part1 = "probit",
family_part2 = Gamma(link = "log"))

tpmodel

summary(tpmodel)

##fit two-part model with transformed regressors and randomly assigned weights
meps$weights = sample(1:30,nrow(meps),replace = TRUE)

tpmodel = tpm(formula_part1 = exp_tot~female+age, formula_part2 =
exp_tot~female+I(age^2)+ed_colplus,data = meps,link_part1 = "logit",
family_part2 = Gamma(link = "log"),weights = meps$weights)

tpmodel

summary(tpmodel)


##data for count response
data("bioChemists")

##fit two-part model with the same regressors in both parts, with logistic
##regression model for the first part, and poisson regression model with
##default log link for the second-part model
tpmodel = tpm(art ~ .,data = bioChemists,link_part1 = "logit",
family_part2 = poisson)

tpmodel

summary(tpmodel)



[Package twopartm version 0.1.0 Index]