biprobit_partial {endogeneity} | R Documentation |
Recursive Bivariate Probit Model with Partially Observed First Stage
Description
Estimate two probit models with bivariate normally distributed error terms, in which the dependent variable of the first stage model is partially observed (or unobserved).
First stage (Probit, m_i
is partially observed):
m_i=1(\boldsymbol{\alpha}'\mathbf{w_i}+u_i>0)
Second stage (Probit):
y_i = 1(\boldsymbol{\beta}'\mathbf{x_i} + {\gamma}m_i + \sigma v_i>0)
Endogeneity structure:
u_i
and v_i
are bivariate normally distributed with a correlation of \rho
.
Unobserved m_i
should be coded as NA. w and x can be the same set of variables. Identification can be weak if w are not good predictors of m.
Observing m_i
for 10%~20% of observations can significantly improve the identification of the model.
Usage
biprobit_partial(
form1,
form2,
data = NULL,
EM = FALSE,
par = NULL,
method = "BFGS",
verbose = 0,
maxIter = 500,
tol = 1e-05,
tol_LL = 1e-06
)
Arguments
form1 |
Formula for the first probit model, in which the dependent variable is partially observed. |
form2 |
Formula for the second probit model, the partially observed dependent variable of the first stage is automatically added as a regressor in this model (do not add manually) |
data |
Input data, a data frame |
EM |
Whether to maximize likelihood use the Expectation-Maximization (EM) algorithm, which is slower but more robust. Defaults to FLASE, but should change to TRUE is the model has convergence issues. |
par |
Starting values for estimates |
method |
Optimization algorithm. Default is BFGS |
verbose |
A integer indicating how much output to display during the estimation process.
|
maxIter |
max iterations for EM algorithm |
tol |
tolerance for convergence of EM algorithm |
tol_LL |
tolerance for convergence of likelihood |
Value
A list containing the results of the estimated model, some of which are inherited from the return of maxLik
estimates: Model estimates with 95% confidence intervals. Prefix "1" means first stage variables.
estimate or par: Point estimates
variance_type: covariance matrix used to calculate standard errors. Either BHHH or Hessian.
var: covariance matrix
se: standard errors
gradient: Gradient function at maximum
hessian: Hessian matrix at maximum
gtHg:
g'H^-1g
, where H^-1 is simply the covariance matrix. A value close to zero (e.g., <1e-3 or 1e-6) indicates good convergence.LL or maximum: Likelihood
AIC: AIC
BIC: BIC
n_obs: Number of observations
n_par: Number of parameters
iterations: number of iterations taken to converge
message: Message regarding convergence status.
Note that the list inherits all the components in the output of maxLik. See the documentation of maxLik for more details.
References
Peng, Jing. (2023) Identification of Causal Mechanisms from Randomized Experiments: A Framework for Endogenous Mediation Analysis. Information Systems Research, 34(1):67-84. Available at https://doi.org/10.1287/isre.2022.1113
See Also
Other endogeneity:
bilinear()
,
biprobit_latent()
,
biprobit()
,
linear_probit()
,
pln_linear()
,
pln_probit()
,
probit_linearRE()
,
probit_linear_latent()
,
probit_linear_partial()
,
probit_linear()
Examples
library(MASS)
N = 5000
rho = -0.5
set.seed(1)
x = rbinom(N, 1, 0.5)
z = rnorm(N)
e = mvrnorm(N, mu=c(0,0), Sigma=matrix(c(1,rho,rho,1), nrow=2))
e1 = e[,1]
e2 = e[,2]
m = as.numeric(1 + x + 3*z + e1 > 0)
y = as.numeric(1 + x + z + m + e2 > 0)
est = biprobit(m~x+z, y~x+z+m)
print(est$estimates, digits=3)
# partially observed version of m
observed_pct = 0.2
m_p = m
m_p[sample(N, N*(1-observed_pct))] = NA
est_partial = biprobit_partial(m_p~x+z, y~x+z)
print(est_partial$estimates, digits=3)