EMSS {EMSS}R Documentation

EM type Estimation Methods for the Heckman's Sample Selection Model

Description

Some algorithms: ECM, ECMnr and ECME can be used to estimate parameters in Heckman selection model and contain the advantages of the EM algorithm: easy implementation and numerical stability. "ECMnr" stands for Expectation/Conditioncal Maximization with Newton-Raphson, and "ECME" for Expectation/Conditional Maximization Either.

Usage

EMSS(
  response,
  selection,
  data,
  method = "ECM",
  initial.param = NULL,
  eps = 10^(-10)
)

Arguments

response

a formula for the response equation.

selection

a formula for the selection equation.

data

a data frame and data has to be included with the form of data.frame.

method

a character indicating which method to be used. ECM stands for Expectation Conditional Maximization, and ECMnr stands for Expectation Conditioncal Maximization with Newton-Raphson, and ECME for Expectation Conditional Maximization Either.

initial.param

a vector, initial parameter values for the estimation. The length of the initial parameters has to be same as the length of parameters, which are to be estimated.

eps

a numerical error value for the end of the loop. A minimum value that can be arbitrarily set to terminate the iteration of the function, in order to find the optimal parameter estimation.

Details

The dependent variable of the selection equation (specified by argument selection) must have exactly two levels (e.g., 'FALSE' and 'TRUE', or '0' and '1'). The default argument method is "ECM" and the default start values ("NULL") are obtained by two-step estimation of this model through the command selection from the package sampleSelection. NA's are allowed in the data. These are ignored if the corresponding outcome is unobserved, otherwise observations which contain NA (either in selection or outcome) are changed to 0.

Value

ECM returns an object of class "ECM". The object class "ECM" is a list containing the following components.

call

a matched call.

estimate_response

estimated regression coefficients for the response formula.

estimate_selection

estimated regression coefficients for the sample selection formula.

estimate_sigma

an estimated scale paramter for the bivariate normal distribution.

estimate_rho

an estimated correlation coefficient for the bivariate normal distribution.

hessian_mat

hessian matrix for parameters.

resp_leng

the numbers of coefficients for the response formula

select_leng

the numbers of coefficients for the selection formula

Q_value

the vallue of the Q function for EM type algorithms

names_response

names of regression coefficients for the reponse formula.

names_selection

names of regression coefficients for the selection formula.

Background

Heckman selection model is classic to deal with the data where the outcome is partially observed and the missing part is not at random. Heckman (1979) developed 2-step and maximum likelihood estimation (MLE) to do the estimation for this selection model. And these two method are described in R package sampleSelection by Toomet and Henningsen (2008). Zhelonkin et al. (2016) developed robust 2-stage method which performs more robustly than the 2-step method to deal with the data where outlying observations exist and ssmrob package is available. Zhao et al. (2020) extended EM algorithm to more general cases resulting in three algorithms: ECM, ECM(NR), and ECME. They also own EM algorithm's main advantages, namely, stability and ease of implementation.

References

Heckman, J. (1979) Sample selection bias as a specication error. Econometrica, 47, 153-161.

Toomet, O. and Henningsen, A. (2008) Sample selection models in R:Package sampleSelection. Journal of Statistical Software, 27, 1-23.

Zhao,J., Kim, H.-J. and Kim, H.-M. (2020) New EM-type algorithms for the Heckman selection model. Computational Statistics and Data Analysis, 146, https://doi.org/10.1016/j.csda.2020.106930.

Zhelonkin, M., Genton, M.G. and Ronchetti, E. (2016) Robust inference in sample selection models. Journal of the Royal Statistical Society Series B, 78, 805-827.

Examples

data(Smoke, package = "EMSS")
ex1 <- EMSS(response = cigs_intervals ~ educ,
           selection = smoker ~ educ + age,
           data = Smoke)
print(ex1)

data(Smoke, package = "EMSS")
ex2 <- EMSS(response = cigs_intervals ~ educ,
           selection =  smoker ~ educ + age,
           data = Smoke, method="ECMnr")
print(ex2)

## example using random numbers with exclusion restriction

N <- 1000
errps <- mvtnorm::rmvnorm(N,c(0,0),matrix(c(1,0.5,0.5,1),2,2) )
xs <- runif(N)
ys <- xs+errps[,1]>0
xo <- runif(N)
yo <- (xo+errps[,2])*(ys>0)
ex3 <- EMSS(response = yo ~ xo,
           selection = ys ~ xs,
           initial.param = c(rep(0,4), 0.3, 0.6), method="ECMnr")
print(ex3)


[Package EMSS version 1.1.1 Index]