biprobit_latent {endogeneity}R Documentation

Recursive Bivariate Probit Model with Latent First Stage

Description

Estimate two probit models with bivariate normally distributed error terms, in which the dependent variable of the first stage model is unobserved.

First stage (Probit, mim_i^* is unobserved):

mi=1(αwi+ui>0)m_i^*=1(\boldsymbol{\alpha}'\mathbf{w_i}+u_i>0)

Second stage (Probit):

yi=1(βxi+γmi+σvi>0)y_i = 1(\boldsymbol{\beta}'\mathbf{x_i} + {\gamma}m_i^* + \sigma v_i>0)

Endogeneity structure: uiu_i and viv_i are bivariate normally distributed with a correlation of ρ\rho.

w and x can be the same set of variables. The identification of this model is generally weak, especially if w are not good predictors of m. γ\gamma is assumed to be positive to ensure that the model estimates are unique.

Usage

biprobit_latent(
  form1,
  form2,
  data = NULL,
  EM = FALSE,
  par = NULL,
  method = "BFGS",
  verbose = 0,
  maxIter = 500,
  tol = 1e-05,
  tol_LL = 1e-06
)

Arguments

form1

Formula for the first probit model, in which the dependent variable is unobserved. Use a formula like ~w to avoid specifying the dependent variable.

form2

Formula for the second probit model, the latent dependent variable of the first stage is automatically added as a regressor in this model

data

Input data, a data frame

EM

Whether to maximize likelihood use the Expectation-Maximization (EM) algorithm, which is slower but more robust. Defaults to FLASE, but should change to TRUE is the model has convergence issues.

par

Starting values for estimates

method

Optimization algorithm. Default is BFGS

verbose

A integer indicating how much output to display during the estimation process.

  • <0 - No ouput

  • 0 - Basic output (model estimates)

  • 1 - Moderate output, basic ouput + parameter and likelihood in each iteration

  • 2 - Extensive output, moderate output + gradient values on each call

maxIter

max iterations for EM algorithm

tol

tolerance for convergence of EM algorithm

tol_LL

tolerance for convergence of likelihood

Value

A list containing the results of the estimated model, some of which are inherited from the return of maxLik

Note that the list inherits all the components in the output of maxLik. See the documentation of maxLik for more details.

References

Peng, Jing. (2023) Identification of Causal Mechanisms from Randomized Experiments: A Framework for Endogenous Mediation Analysis. Information Systems Research, 34(1):67-84. Available at https://doi.org/10.1287/isre.2022.1113

See Also

Other endogeneity: bilinear(), biprobit_partial(), biprobit(), linear_probit(), pln_linear(), pln_probit(), probit_linearRE(), probit_linear_latent(), probit_linear_partial(), probit_linear()

Examples


library(MASS)
N = 2000
rho = -0.5
set.seed(1)

x = rbinom(N, 1, 0.5)
z = rnorm(N)

e = mvrnorm(N, mu=c(0,0), Sigma=matrix(c(1,rho,rho,1), nrow=2))
e1 = e[,1]
e2 = e[,2]

m = as.numeric(1 + x + z + e1 > 0)
y = as.numeric(1 + x + z + m + e2 > 0)

est = biprobit(m~x+z, y~x+z+m)
print(est$estimates, digits=3)

est_latent = biprobit_latent(~x+z, y~x+z)
print(est_latent$estimates, digits=3)


[Package endogeneity version 2.1.3 Index]