mvoprobit {switchSelection} | R Documentation |
Multivariate ordered probit model with heteroscedasticity and (non-random) sample selection.
Description
This function allows to estimate parameters of multivariate ordered probit model and its extensions. It is possible to account for heteroscedastic variances, non-normal marginal distributions of random errors (under Gaussian copula) and (non-random) sample selection i.e. when some categories of particular dependent variables are observable only under some specific values of other dependent variables. Also it is possible to include continuous equations to get multivariate generalization of endogenous switching model. In this case both maximum-likelihood and two-step (similar to Heckman's method) estimation procedures are implemented.
Usage
mvoprobit(
formula,
formula2 = NULL,
data = NULL,
groups = NULL,
groups2 = NULL,
marginal = list(),
opt_type = "optim",
opt_args = NULL,
start = NULL,
estimator = "ml",
cov_type = ifelse(estimator == "ml", "sandwich", "parametric"),
degrees = NULL,
n_sim = 1000,
n_cores = 1,
control = NULL,
regularization = NULL
)
Arguments
formula |
list which i-th element is an object of class "formula" describing the form of the linear index for the i-th ordered equation. Mean and variance equations should be separated by '|' symbol. |
formula2 |
list which i-th element is an object of class "formula" describing the form of the linear index for the i-th continuous equation. |
data |
data frame containing the variables in the model. |
groups |
matrix which (i, j)-th element is j-th ordered category
(value starting from 0) of i-th dependent ordered variable. Each row of this
matrix describes observable (in data) combination of categories i.e. values
of dependent variables. Special category |
groups2 |
the same as |
marginal |
list such that |
opt_type |
character representing optimization function to be used.
If |
opt_args |
a list of input arguments for the optimization function
selected via |
start |
numeric vector of initial parameters' values. It will be used
as a starting point for optimization purposes. It is also possible to
provide an object of class |
estimator |
character determining estimation method.
If |
cov_type |
character determining the type of covariance matrix to be
returned and used for summary.
First, suppose that |
degrees |
vector of non-negative integers such that |
n_sim |
integer representing the number of GHK draws when there are more than 3 ordered equations. Otherwise alternative (much more efficient) algorithms will be used to calculate multivariate normal probabilities. |
n_cores |
positive integer representing the number of CPU cores used for parallel computing. If possible it is highly recommend to set it equal to the number of available physical cores especially when the system of ordered equations has 2 or 3 equations. |
control |
a list of control parameters. See 'Details'. |
regularization |
a list of control parameters for regularization.
Element |
Details
Multivariate ordered probit model with heteroscedastic random errors has the following form:
z_{ji}^{*} = w_{ji}\gamma_{j} + \sigma_{ji}u_{ji},
\sigma_{ji} = \exp(w_{ji}^{*}\gamma_{j}^{*}), \quad
u_{i}\sim N\left(\begin{bmatrix}0\\ \vdots\\ 0\end{bmatrix},
\Sigma\right), i.i.d.,
\Sigma = \begin{bmatrix}
1 & \rho_{12} & \rho_{13} & ... & \rho_{1J}\\
\rho_{12} & 1 & \rho_{23} & ... & \rho_{2J}\\
\rho_{13} & \rho_{23} & 1 & ... & \rho_{3J}\\
\vdots & \vdots & \vdots & \ddots & \vdots\\
\rho_{1J} & \rho_{2J} & \rho_{3J} & ... & 1
\end{bmatrix},
z_{ji}=\begin{cases}
0\text{, if }z_{ji}^{*}\leq c_{j1}\\
1\text{, if }c_{j1}<z_{ji}^{*}\leq c_{j2}\\
2\text{, if }c_{j2}<z_{ji}^{*}\leq c_{j3}\\
\vdots\\
m_{j}\text{, if }z_{ji}^{*}>c_{jm_{j}}\\
\end{cases},
z_{i}=(z_{1i},...,z_{Ji})^{T},\quad
u_{i} = (u_{1i},u_{2i},...,u_{Ji})^{T},
i\in\{1,2,...,n\},\quad j\in\{1,2,...,J\}.
Where:
-
n
- the number of observations. If there are no omitted observations thenn
equals tonrow(data)
. -
J
- the number of equations i.e.length(formula)
. -
z_{ji}^{*}
- unobservable (latent) value of thej
-th dependent variable. -
z_{ji}
- observable (ordered) value of thej
-th dependent variable. -
(m_{j} + 1)
- the number of categories ofz_{ji}\in\{0,1,...,m_{j}\}
. -
c_{jk}
-k
-th cut of thej
-th dependent variable. -
w_{ji}
- regressors of thej
-th mean equation which should be described informula[[j]]
. -
\gamma_{j}
- regression coefficients of thej
-th mean equation. -
w_{ji}\gamma_{j}
- linear index of thej
-th mean equation. -
u_{i}
- multivariate normal random vector which elements are standard normal random variables. -
\Sigma
- correlation matrix ofu_{i}
i.e.\Sigma_{t_{1}t_{2}}=\rho_{\min(t_{1},t_{2}),\max(t_{1}t_{2})}= \text{corr}\left(u_{it_{1}}, u_{it_{2}}\right)
. -
\sigma_{ji}
- heteroscedastic standard deviation. -
\sigma_{ji}u_{ji}
- heteroscedastic random errors. -
w_{ji}^{*}
- regressors of thej
-th variance equation which should be described informula[[j]]
after '|' symbol. -
\gamma_{j}^{*}
- regression coefficients of thej
-th variance equation. -
w_{ji}^{*}\gamma_{j}^{*}
- linear index of thej
-th variance equation.
Parameters of this model are estimated via maximum-likelihood method
using numeric optimization approach provided through opt_type
argument. The type of covariance matrix estimator may be provided
through cov_type
argument.
To account for (non-random) sample selection unobservable values of
dependent variables
should be coded as -1. For example if z_{1}
is a
binary variable for employment status (0 - unemployed, 1 - employed) and
z_{2}
is ordered variable (ranging from 0 to 2) for job satisfaction
(0 - unsatisfied, 1- satisfied, 2 - highly satisfied)
then z_{2}
is observable
only when z_{1}
equals 1 since job satisfaction observable only for
working individuals. Consequently z_{2}
should be equal
to -1 (minus one) whenever z_{1}
equals to 0. If variables are coded
in this way then groups
matrix will be created automatically.
Otherwise user may provide manual structure of selection mechanism by
mentioning all possible combinations of z_{1}
and z_{2}
values as
a rows of groups
matrix. In this particular example matrix
groups
will have the following form (no need to provide it manually):
\text{groups} =
\begin{bmatrix}
1 & 2\\
1 & 1\\
1 & 0\\
0 & -1
\end{bmatrix}.
Again, please, insure that all z_{2}
equal to -1 for all
z_{1}
which equal to 0 in your data
. Then matrix groups
will automatically have the aforementioned structure
(accounting for non-random sample selection).
If some variables z_{ji}
are missing i.e. take NA
value then
contribution of other dependent variables (for the i-th observation)
still may be included into the
likelihood function by manually substituting NA
with -1 in your
data
. However insure that this particular (missing) z_{ji}
is
not a regressor for other dependent variable
(that may happen in hierarchical systems).
Constant terms (intercepts) are excluded from the model for identification
purposes. If z_{ji}
is a binary variable then -c_{j1}
may be
interpreted as a constant term of the j
-th equation.
If all z_{ji}
are binary variables then the model becomes
multivariate probit.
It is possible to estimate sample-selection and endogenous switching models
with continuous dependent variables by providing the form of
corresponding equations via formula2
argument. Selection (switching)
mechanism will be determined by the aforementioned multivariate ordered
probit model.
First, consider sample selection model with one continuous
equation (y
- wage) and two ordered equations from the previous example
(z_{1}
- employment, z_{2}
- job satisfaction):
y_{i}^{*}=x_{i}\beta+\varepsilon_{i},
y_{i} =
\begin{cases}
y_{i}^{*}\text{, if }z_{1i}=1\text{ and }z_{2i}\geq 1\\
\text{unobservable, otherwise}
\end{cases},
where y_{i}
and x_{i}
are continuous dependent variable
and the vector of exogenous variables correspondingly. These variables should
be described in formula2
. Random errors \varepsilon_{i}
and
u_{i}
are multivariate normal. In this example it is assumed that
information on wages y_{i}
available only for employed z_{1i} = 1
individuals with high enough job satisfaction z_{2i} \geq 1
(suppose
that unsatisfied workers where not asked wage question or surely refuse
to answer).
To estimate this model it is also
necessarily to set unobservable values of y_{i}
in data
to
Inf
to distinguish them from
NA
values representing observations
omitted by random. Finally one needs to manually specify the structure
of equations via groups
and groups2
arguments by providing
all possible combinations of ordered and continuous equations values:
\text{groups} =
\begin{bmatrix}
1 & 2\\
1 & 1\\
1 & 0\\
0 & -1
\end{bmatrix}, \text{groups2} =
\begin{bmatrix}
0\\
0\\
-1\\
-1
\end{bmatrix}.
Where 0
category in group2
indicates that continuous
dependent variable y_{i}
is observable. For example
groups2[2] = 0
indicates that y_{i}
is observable
when groups[2, ] = c(1, 1)
i.e. z_{1i} = 1
and z_{2i} = 1
.
Further suppose that we assume that wage equations are different for
satisfied (z_{2i} = 1
) and highly satisfied (z_{2i} = 2
)
workers:
y_{0i}^{*} = x_{i}\beta_{0} + \varepsilon_{0i},
y_{1i}^{*} = x_{i}\beta_{1} + \varepsilon_{1i},
y_{i} =
\begin{cases}
y_{0i}^{*}\text{, if }z_{1i}=1\text{ and }z_{2i} = 1\\
y_{1i}^{*}\text{, if }z_{1i}=1\text{ and }z_{2i} = 2\\
\text{unobservable, otherwise}
\end{cases}.
To estimate this endogenous switching model arguments groups
and
groups2
should be specified as follows:
\text{groups} =
\begin{bmatrix}
1 & 2\\
1 & 1\\
1 & 0\\
0 & -1
\end{bmatrix}, \text{groups2} =
\begin{bmatrix}
1\\
0\\
-1\\
-1
\end{bmatrix}.
For example
groups2[1] = 1
indicates that when groups[1, ] = c(1, 2)
i.e. z_{1i} = 1
and z_{2i} = 2
we observe y_{i}
in
regime 1
corresponding to the wage of highly satisfied workers.
Similarly groups2[2] = 0
indicates that when
groups[2, ] = c(1, 1)
i.e. z_{1i} = 1
and z_{2i} = 1
we observe y_{i}
in
regime 0
corresponding to the wage of satisfied workers.
Therefore by specifying groups
and groups2
arguments in the
aforementioned way it is possible to estimate various sample selection
and endogenous switching models. Furthermore one may specify several
continuous equations. Indeed, consider additional continuous equation
(y^{h}
- working hours):
y_{i}^{h*}=x_{i}^{h}\beta^{h}+\varepsilon_{i}^{h},
y_{i}^{h} =
\begin{cases}
y_{i}^{h*}\text{, if }z_{1i}=1\\
\text{unobservable, otherwise}
\end{cases}.
Then to estimate the system accounting for this additional continuous
equation simply substitute all y_{i}^{h*}
(such that z_{1i}=0
) in data
with Inf
and specify:
\text{groups} =
\begin{bmatrix}
1 & 2\\
1 & 1\\
1 & 0\\
0 & -1
\end{bmatrix}, \text{groups2} =
\begin{bmatrix}
1 & 0\\
0 & 0\\
-1 & 0\\
-1 & -1
\end{bmatrix},
where groups2[, 1]
describes regimes of the wage equation y_{i}
while groups[, 2]
contains regimes of the hours
equation y_{i}^{h}
. Note that formula of the first equation (wage)
should be specified in formula2[[1]]
and formula of the second
equation should be provided via formula2[[2]]
i.e. as the
first and the second elements in a formula2
list correspondingly.
By default all the models are estimated via maximum likelihood method.
However if estimator = "2step"
then models with one continuous
equation (but voluntary number of regimes of this equation)
will be estimated via two-step procedure proposed
by E. Kossova and B. Potanin (2018). The idea is similar to classical
Heckman's method i.e. to substitute conditional expectation of random error
into continuous equation with it's consistent estimator.
For simplicity suppose that there is only one regime.
Then regression equation may be represented in the following form:
y_{i}=
x_{i}\beta+
\sum\limits_{j=1}^{J} \rho_{j}\sigma\lambda_{ji}+
\varepsilon_{i}^{*},
where:
\varepsilon_{i}^{*} = \varepsilon_{i} - E(\varepsilon_{i}|z_{1i},...z_{Ji})=
\varepsilon_{i} - \sum\limits_{j=1}^{J} \rho_{j}\sigma\lambda_{ji},
\lambda_{ji}=
\lambda_{ji}^{(1)} + \lambda_{ji}^{(2)},
\lambda_{ji}^{(1)}=
\begin{cases}
0\text{, if }z_{ji} = 0\\
-\frac{\partial \ln P_{i}^{*}}{\partial a_{ji}}\text{, otherwise }
\end{cases},
\qquad
\lambda_{ji}^{(2)}=
\begin{cases}
0\text{, if }z_{ji} = m_{j}\\
-\frac{\partial \ln P_{i}^{*}}{\partial b_{ji}}\text{, otherwise }
\end{cases},
P_{i}^{*}(a_{1i},...,a_{Ji};b_{1i},...,b_{Ji})=
P(a_{1i}\leq u_{1i}\leq b_{1i},....,a_{Ji}\leq u_{Ji}\leq b_{Ji}),
a_{ji}=
\begin{cases}
-\infty\text{, if }z_{ji} = 0\\
\frac{c_{jz_{ji}}-w_{ji}\gamma_{j}}{\sigma_{ji}}\text{, otherwise }
\end{cases},
\qquad
b_{ji}=
\begin{cases}
\infty\text{, if }z_{ji} = m_{j}\\
\frac{c_{j(z_{ji}+1)}-w_{ji}\gamma_{j}}{\sigma_{ji}}\text{, otherwise }
\end{cases}.
On the first step \hat{\lambda}_{ji}
are calculated using estimates
of multivariate ordered probit model.
On the second step \hat{\lambda}_{ji}
are used as the regressors
instead of \lambda_{ji}
in a least squares estimation of
y_{i}
equation. If cov_type = "parametric"
then
asymptotic covariance matrix estimator proposed by E. Kossova and B. Potanin (2018)
is used. If cov_type = "nonparametric"
then robust covariance matrix
estimator of Newey (2009) is applied. To relax normality assumption
of \varepsilon_{i}
it is possible to use multivariate generalization of Newey (2009) method
described in E. Kossova and B. Potanin (2020).
The idea is to use polynomials of \lambda_{ji}
on the second step:
y_{i}=
x_{i}\beta+
\sum\limits_{j=1}^{J} \sum\limits_{t=1}^{d_{j}}\tau_{jt}\lambda_{ji}^{t}+
\varepsilon_{i}^{*},
where \tau_{jt}
are polynomial coefficients.
Polynomial order d_{j}
is determined by degrees[j]
value.
If there are more than one regime then degrees
should be a matrix
such that degrees[r, j]
is d_{j}
corresponding to the
r
-th regime. However if there are more than one regime and
degrees
is a vector it will be transformed into a matrix which rows
are the same as degrees
.
If estimator = "2step"
then it is possible to precalculate first
step model with mvoprobit
function
(setting formula2 = NULL
)
and pass it through the formula
argument. It allows to experiment
with various formula2
and degrees
specifications without
extra computational burden associated with the first step estimation.
Function pmnorm
is used internally for calculation
of multivariate normal probabilities, densities and their derivatives.
Marginal distribution of u_{i}
may be determined with
marginal
argument that is similar to the same argument
in pmnorm
. Note that joint distribution
of u
(random errors of ordered equations) and
\varepsilon
(random errors of continuous equations) will be
determined by Gaussian copula.
Optimization always starts with optim
. If
opt_type = "gena"
or opt_type = "pso"
then
gena
or pso
is used
to proceed optimization starting
from initial point provided by optim
. Manual arguments to
optim
should be provided in a form of a list through opt_args$optim
.
Similarly opt_args$gena
and opt_args$pso
provide manual
arguments to gena
and pso
.
For example to provide Nelder-Mead optimizer to
optim
and
restrict the number of genetic algorithm iterations to 10
make
opt_args = list(optim = list(method = "Nelder-Mead"),
gena = list(maxiter = 10))
.
For more information on multivariate sample selection and endogenous switching models see E. Kossova and B. Potanin (2018), E. Kossova, L. Kupriianova, and B. Potanin (2020) and E. Kossova and B. Potanin (2022).
Currently control
has no input arguments intended for the users.
This argument is used for some internal purposes of the package.
Value
This function returns an object of class 'mvoprobit' which is a list containing the following elements:
-
par
- vector of parameters' estimates. -
coef
- list which j-the elementcoef[[j]]
is a vector of regression coefficients estimates of the j-th ordered equation i.e.\hat{\gamma}_{j}
. -
coef_var
- list which j-the elementcoef_var[[j]]
is a vector of regression coefficients estimates of the variance part of the j-th ordered equation i.e.\hat{\gamma}_{j}^{*}
. -
coef2
- list which j-the elementcoef2[[j]]
is a matrix of regression coefficients estimates of the j-th continuous equation. Wherein i-th row of this matrix contains estimates of regression coefficients corresponding to the i-th regime of j-th continuous variable. -
sigma
- estimate of the covariance matrix of random errors of ordered equations i.e.\widehat{\Sigma}
. -
var2
- estimates of the variances of random errors of continuous equations. -
sigma2
- estimates of covariances between random errors of continuous equations. -
cov2
- list which j-th elementcov_y[[j]]
contains estimates of covariances between random errors of j-th continuous equation in different regimes. -
cuts
- list which j-the elementcuts[[j]]
is a vector of cuts estimates of the j-th equation i.e.\hat{c}_{j}
. -
ind
- list containing some indexes partition of the model (not intended for users). -
logLik
- log-likelihood value. -
regressors
- numeric matrix which j-th elementregressors[[j]]
is a regressors matrix of the j-th equation i.e.w_{j}
. -
regressors2
- list which j-th elementregressors2[[j]]
is a regressors matrix of the j-th variance equation i.e.w_{j}^{*}
. -
dependent
- numeric matrix which j-th columndependent[, j]
is a vector of dependent variablez_{j}
values. -
control_lnL
- some additional variables to be passed to likelihood function (not intended for users). -
formula
- the same asformula
input argument but all elements are coerced to formula type. -
lambda
- matrix such thatlambda[i, j]
corresponds to\hat{\lambda}_{ij}
.predict.mvoprobit
for more information. -
data_list
- list which j-th element data_list[[j]] is a dataframe containing regressors and dependent variable of the j-th equation. -
data
- the same asdata
input argument but without missing values. -
cov
- estimate of the covariance matrix of parameters' estimator. -
cov_type
- type of the asymptotic covariance matrix estimator. -
cov_2step
- estimate of the covariance matrix of parameters' estimator associated with the second step parameters i.e. whenestimator = "2step"
. -
sd
- standard errors of the estimates. -
p_value
- p-values of the tests on significance of the parameters where null hypothesis is that corresponding parameter equals zero. -
tbl
- special table used to create a summary (not intended for users). -
groups
- the same asgroups
input argument or automatically generated matrix representing the structure of the system of equations. Please, see 'Details' section above for more information. -
groups2
- the same asgroups2
input argument or automatically generated matrix representing the structure of the system of equations. Please, see 'Details' section above for more information. -
marginal
- the same asmarginal
input argument. -
degrees
- the same asdegrees
input argument. -
model1
- first step estimation results whenestimator = "2step"
. -
coef_lambda
- estimates of coefficients of lambdas. -
other
- list of additional variables not intended for the user.
It is highly recommended to get estimates via
coef.mvoprobit
function.
References
W. K. Newey (2009). Two-step series estimation of sample selection models. The Econometrics Journal, vol. 12(1), pages 217-229.
E. Kossova, B. Potanin (2018). Heckman method and switching regression model multivariate generalization. Applied Econometrics, vol. 50, pages 114-143.
E. Kossova, L. Kupriianova, B. Potanin (2020). Parametric and semiparametric multivariate sample selection models estimators' accuracy: Comparative analysis on simulated data. Applied Econometrics, vol. 57, pages 119-139.
E. Kossova, B. Potanin (2022). Estimation of Gaussian multinomial endogenous switching model. Applied Econometrics, vol. 67, pages 121-143.
Examples
# -------------------------------
# CPS data example
# -------------------------------
# Set seed for reproducibility
set.seed(123)
# Upload data
data(cps)
# Prepare ordered variable for education
cps$educ <- NA
cps$educ[cps$basic == 1] <- 0
cps$educ[cps$bachelor == 1] <- 1
cps$educ[cps$master == 1] <- 2
# Labor supply (probit) model
f_work <- work ~ age + I(age ^ 2) + bachelor + master + health +
slwage + nchild
model1 <- mvoprobit(f_work, data = cps)
summary(model1)
# Education choice (ordered probit) model
f_educ <- educ ~ age + I(age ^ 2) + sbachelor + smaster
model2 <- mvoprobit(f_educ, data = cps)
summary(model2)
# Labor supply with endogenous education
# treatment (recursive or hierarchical ordered probit) model
model3 <- mvoprobit(list(f_work, f_educ), data = cps)
summary(model3)
# Sample selection (on employment) Heckman's model
f_lwage <- lwage ~ age + I(age ^ 2) + bachelor + master + health
cps$lwage[cps$work == 0] <- Inf
model4 <- mvoprobit(f_work, f_lwage, data = cps)
summary(model4)
# Endogenous education treatment with non-random sample selection
model5 <- mvoprobit(list(f_work, f_educ), f_lwage, data = cps)
summary(model5)
# Endogenous switching model with non-random sample selection
groups <- cbind(c(1, 1, 1, 0, 0, 0),
c(0, 1, 2, 0, 1, 2))
groups2 <- matrix(c(0, 1, 2, -1, -1, -1), ncol = 1)
f_lwage2 <- lwage ~ age + I(age ^ 2) + health
model6 <- mvoprobit(list(f_work, f_educ), f_lwage2,
groups = groups, groups2 = groups2,
data = cps)
summary(model6)
# -------------------------------
# Simulated data example 1
# Ordered probit model
# -------------------------------
# ---
# Step 1
# Simulation of data
# ---
# Load required package
library("mnorm")
# Set seed for reproducibility
set.seed(123)
# The number of observations
n <- 1000
# Regressors (covariates)
w1 <- runif(n = n, min = -1, max = 1)
w2 <- runif(n = n, min = -1, max = 1)
# Random errors
u <- rnorm(n = n, mean = 0, sd = 1)
# Coefficients
gamma <- c(-1, 2)
# Linear index
li <- gamma[1] * w1 + gamma[2] * w2
# Latent variable
z_star <- li + u
# Cuts
cuts <- c(-1, 0.5, 2)
# Observable ordered outcome
z <- rep(0, n)
z[(z_star > cuts[1]) & (z_star <= cuts[2])] <- 1
z[(z_star > cuts[2]) & (z_star <= cuts[3])] <- 2
z[z_star > cuts[3]] <- 3
table(z)
# Data
data <- data.frame(w1 = w1, w2 = w2, z = z)
# ---
# Step 2
# Estimation of parameters
# ---
# Estimation
model <- mvoprobit(z ~ w1 + w2,
data = data)
summary(model)
# Compare estimates and true values of parameters
# regression coefficients
cbind(true = gamma, estimate = model$coef[[1]])
# cuts
cbind(true = cuts, estimate = model$cuts[[1]])
# ---
# Step 3
# Estimation of probabilities and marginal effects
# ---
# Predict probability of dependent variable
# equals to 2 for every observation in a sample.
# P(z = 2)
prob <- predict(model, group = 2, type = "prob")
head(prob)
# Calculate mean marginal effect of w2 on P(z = 1)
mean(predict(model, group = 1, type = "prob", me = "w2"))
# Calculate probabilities and marginal effects
# for manually provided observations.
# new data
newdata <- data.frame(z = c(1, 1),
w1 = c(0.5, 0.2),
w2 = c(-0.3, 0.8))
# probability P(z = 2)
predict(model, group = 2, type = "prob", newdata = newdata)
# linear index
predict(model, type = "li", newdata = newdata)
# marginal effect of w1 on P(z = 2)
predict(model, group = 2, type = "prob", newdata = newdata, me = "w1")
# marginal effect of w1 and w2 on P(z = 3)
predict(model, group = 3, type = "prob",
newdata = newdata, me = c("w1", "w2"))
# marginal effect of w2 on the linear index
predict(model, group = 2, type = "li", newdata = newdata, me = "w2")
# discrete marginal effect i.e. P(z = 2 | w1 = 0.5) - P(z = 2 | w1 = 0.2)
predict(model, group = 2, type = "prob", newdata = newdata,
me = "w2", eps = c(0.2, 0.5))
# adjusted conditional expectation for endogenous switching and
# sample selection models with continuous outcome with random error 'e'
# E(e | z == 2) / cov(e, u)
# where joint distribution of 'e' and 'u' determined by
# Gaussian copula and 'e' is normally distributed
predict(model, group = 2, type = "lambda", newdata = newdata)
# ---
# Step 4
# Ordered logit model
# ---
# Estimate ordered logit model with a unit variance
# that is just a matter of reparametrization i.e.
# do not affect signs and significance of coefficients
# and dot not affect at all marginal effects
logit <- mvoprobit(z ~ w1 + w2,
data = data,
marginal = "logistic")
summary(logit)
# Compare ordered probit and ordered logit models
# using Akaike and Bayesian information criteria
# AIC
c(probit = AIC(model), logit = AIC(logit))
# BIC
c(probit = BIC(model), logit = BIC(logit))
# Calculation of probabilities and marginal effects is identical
# to the previous example
# probability P(z = 1)
predict(logit, group = 1, type = "prob", newdata = newdata)
# marginal effect of w2 on P(z = 1)
predict(logit, group = 1, type = "prob", newdata = newdata, me = "w2")
# E(e | z == 1) / cov(e, u)
predict(logit, group = 1, type = "lambda", newdata = newdata)
# ---
# Step 5
# Semiparametric model with Gallant and Nychka distribution
# ---
pgn <- mvoprobit(z ~ w1 + w2,
data = data,
marginal = list("PGN" = 3))
summary(pgn)
# Calculation of probabilities and marginal effects is identical
# to the previous example
# probability P(z = 3)
predict(pgn, group = 3, type = "prob", newdata = newdata)
# marginal effect of w2 on P(z = 3)
predict(pgn, group = 3, type = "prob", newdata = newdata, me = "w2")
# E(e | z == 3) / cov(e, u)
predict(pgn, group = 3, type = "lambda", newdata = newdata)
# Test normality assumption via likelihood ratio test
lrtest(model, pgn)
# -------------------------------
# Simulated data example 2
# Heteroscedastic ordered
# probit model
# -------------------------------
# Load required package
library("mnorm")
# ---
# Step 1
# Simulation of data
# ---
# Set seed for reproducibility
set.seed(123)
# The number of observations
n <- 1000
# Regressors (covariates)
w1 <- runif(n = n, min = -1, max = 1)
w2 <- runif(n = n, min = -1, max = 1)
w3 <- runif(n = n, min = -1, max = 1)
# Random errors
u <- rnorm(n, mean = 0, sd = 1)
# Coefficients of mean equation
gamma <- c(-1, 2)
# Coefficients of variance equation
gamma_het <- c(0.5, -1)
# Linear index of mean equation
li <- gamma[1] * w1 + gamma[2] * w2
# Linear index of variance equation
li_het <- gamma_het[1] * w2 + gamma_het[2] * w3
# Heteroscedastic stdandard deviation
# i.e. value of variance equation
sd_het <- exp(li_het)
# Latent variable
z_star <- li + u * sd_het
# Cuts
cuts <- c(-1, 0.5, 2)
# Observable ordered outcome
z <- rep(0, n)
z[(z_star > cuts[1]) & (z_star <= cuts[2])] <- 1
z[(z_star > cuts[2]) & (z_star <= cuts[3])] <- 2
z[z_star > cuts[3]] <- 3
table(z)
# Data
data <- data.frame(w1 = w1, w2 = w2, w3 = w3, z = z)
# ---
# Step 2
# Estimation of parameters
# ---
# Estimation
model <- mvoprobit(z ~ w1 + w2 | w2 + w3,
data = data)
summary(model)
# Compare estimates and true values of parameters
# regression coefficients of mean equation
cbind(true = gamma, estimate = model$coef[[1]])
# regression coefficients of variance equation
cbind(true = gamma_het, estimate = model$coef_var[[1]])
# cuts
cbind(true = cuts, estimate = model$cuts[[1]])
# Test for homoscedasticity
model0 <- mvoprobit(z ~ w1 + w2, data = data)
lrtest(model, model0)
# ---
# Step 3
# Estimation of probabilities and marginal effects
# ---
# Predict probability of dependent variable
# equals to 2 for every observation in a sample.
# P(z = 2)
prob <- predict(model, group = 2, type = "prob")
head(prob)
# Calculate mean marginal effect of w2 on P(z = 1)
mean(predict(model, group = 1, type = "prob", me = "w2"))
# Calculate corresponding probability, linear
# index and heteroscedastic standard deviations for
# manually provided observations.
# new data
newdata <- data.frame(z = c(1, 1),
w1 = c(0.5, 0.2),
w2 = c(-0.3, 0.8),
w3 = c(0.6, 0.1))
# probability P(z = 2)
predict(model, group = 2, type = "prob", newdata = newdata)
# linear index
predict(model, type = "li", newdata = newdata)
# standard deviation
predict(model, type = "sd", newdata = newdata)
# marginal effect of w3 on P(Z = 3)
predict(model, group = 3, type = "prob", newdata = newdata, me = "w3")
# marginal effect of w2 on the standard error
predict(model, group = 2, type = "sd", newdata = newdata, me = "w2")
# discrete marginal effect i.e. P(Z = 2 | w1 = 0.5) - P(Z = 2 | w1 = 0.2)
predict(model, group = 2, type = "prob", newdata = newdata,
me = "w2", eps = c(0.2, 0.5))
# -------------------------------
# Simulated data example 3
# Bivariate ordered probit model
# with heteroscedastic second
# equation
# -------------------------------
# Load required package
library("mnorm")
# ---
# Step 1
# Simulation of data
# ---
# Set seed for reproducibility
set.seed(123)
# The number of observations
n <- 1000
# Regressors (covariates)
w1 <- runif(n = n, min = -1, max = 1)
w2 <- runif(n = n, min = -1, max = 1)
w3 <- runif(n = n, min = -1, max = 1)
w4 <- runif(n = n, min = -1, max = 1)
# Covariance matrix of random errors
rho <- 0.5
sigma <- matrix(c(1, rho,
rho, 1),
nrow = 2)
# Random errors
u <- mnorm::rmnorm(n = n, mean = c(0, 0), sigma = sigma)
# Coefficients
gamma1 <- c(-1, 2)
gamma2 <- c(1, 1.5)
# Coefficients of variance equation
gamma2_het <- c(0.5, -1)
# Linear indexes
li1 <- gamma1[1] * w1 + gamma1[2] * w2
li2 <- gamma2[1] * w2 + gamma2[2] * w3
# Linear index of variance equation
li2_het <- gamma2_het[1] * w2 + gamma2_het[2] * w4
# Heteroscedastic stdandard deviation
# i.e. value of variance equation
sd2_het <- exp(li2_het)
# Latent variables
z1_star <- li1 + u[, 1]
z2_star <- li2 + u[, 2] * sd2_het
# Cuts
cuts1 <- c(-1, 0.5, 2)
cuts2 <- c(-2, 0)
# Observable ordered outcome
# first outcome
z1 <- rep(0, n)
z1[(z1_star > cuts1[1]) & (z1_star <= cuts1[2])] <- 1
z1[(z1_star > cuts1[2]) & (z1_star <= cuts1[3])] <- 2
z1[z1_star > cuts1[3]] <- 3
# second outcome
z2 <- rep(0, n)
z2[(z2_star > cuts2[1]) & (z2_star <= cuts2[2])] <- 1
z2[z2_star > cuts2[2]] <- 2
# distribution
table(z1, z2)
# Data
data <- data.frame(w1 = w1, w2 = w2,
w3 = w3, w4 = w4,
z1 = z1, z2 = z2)
# ---
# Step 2
# Estimation of parameters
# ---
# Estimation
model <- mvoprobit(list(z1 ~ w1 + w2,
z2 ~ w2 + w3 | w2 + w4),
data = data)
summary(model)
# Compare estimates and true values of parameters
# regression coefficients of the first equation
cbind(true = gamma1, estimate = model$coef[[1]])
# regression coefficients of the second equation
cbind(true = gamma2, estimate = model$coef[[2]])
# cuts of the first equation
cbind(true = cuts1, estimate = model$cuts[[1]])
# cuts of the second equation
cbind(true = cuts2, estimate = model$cuts[[2]])
# correlation coefficients
cbind(true = rho, estimate = model$sigma[1, 2])
# regression coefficients of variance equation
cbind(true = gamma2_het, estimate = model$coef_var[[2]])
# ---
# Step 3
# Estimation of probabilities and linear indexes
# ---
# Predict probability P(z1 = 2, z2 = 0)
prob <- predict(model, group = c(2, 0), type = "prob")
head(prob)
# Calculate mean marginal effect of w2 on:
# P(z1 = 1)
mean(predict(model, group = c(1, -1), type = "prob", me = "w2"))
# P(z1 = 1, z2 = 0)
mean(predict(model, group = c(1, 0), type = "prob", me = "w2"))
# Calculate corresponding probability and linear
# index for manually provided observations.
# new data
newdata <- data.frame(z1 = c(1, 1),
z2 = c(1, 1),
w1 = c(0.5, 0.2),
w2 = c(-0.3, 0.8),
w3 = c(0.6, 0.1),
w4 = c(0.3, -0.5))
# probability P(z1 = 2, z2 = 0)
predict(model, group = c(2, 0), type = "prob", newdata = newdata)
# linear index
predict(model, type = "li", newdata = newdata)
# marginal probability P(z2 = 1)
predict(model, group = c(-1, 1), type = "prob", newdata = newdata)
# marginal probability P(z1 = 3)
predict(model, group = c(3, -1), type = "prob", newdata = newdata)
# conditional probability P(z1 = 2 | z2 = 0)
predict(model, group = c(2, 0), given_ind = 2,
type = "prob", newdata = newdata)
# conditional probability P(z2 = 1 | z1 = 3)
predict(model, group = c(3, 1), given_ind = 1,
type = "prob", newdata = newdata)
# marginal effect of w4 on P(Z2 = 2)
predict(model, group = c(-1, 2),
type = "prob", newdata = newdata, me = "w4")
# marginal effect of w4 on P(z1 = 3, Z2 = 2)
predict(model, group = c(3, 2),
type = "prob", newdata = newdata, me = "w4")
# marginal effect of w4 on P(z1 = 3 | z2 = 2)
predict(model, group = c(3, 2), given_ind = 2,
type = "prob", newdata = newdata, me = "w4")
# ---
# Step 4
# Replication under non-random sample selection
# ---
# Suppose that z2 is unobservable when z1 = 1 or z1 = 3
z2[(z1 == 1) | (z1 == 3)] <- -1
data$z2 <- z2
# Replicate estimation procedure
model <- mvoprobit(list(z1 ~ w1 + w2,
z2 ~ w2 + w3 | w2 + w4),
cov_type = "GOP",
data = data)
summary(model)
# Compare estimates and true values of parameters
# regression coefficients of the first equation
cbind(true = gamma1, estimate = model$coef[[1]])
# regression coefficients of the second equation
cbind(true = gamma2, estimate = model$coef[[2]])
# cuts of the first equation
cbind(true = cuts1, estimate = model$cuts[[1]])
# cuts of the second equation
cbind(true = cuts2, estimate = model$cuts[[2]])
# correlation coefficients
cbind(true = rho, estimate = model$sigma[1, 2])
# regression coefficients of variance equation
cbind(true = gamma2_het, estimate = model$coef_var[[2]])
# ---
# Step 5
# Semiparametric model with marginal logistic and PGN distributions
# ---
# Estimate the model
model <- mvoprobit(list(z1 ~ w1 + w2,
z2 ~ w2 + w3 | w2 + w4),
data = data,
marginal = list(PGN = 3, logistic = NULL))
summary(model)
# -------------------------------
# Simulated data example 4
# Heckman model with
# ordered selection mechanism
# -------------------------------
# Load required package
library("mnorm")
# ---
# Step 1
# Simulation of data
# ---
# Set seed for reproducibility
set.seed(123)
# The number of observations
n <- 1000
# Regressors (covariates)
w1 <- runif(n = n, min = -1, max = 1)
w2 <- runif(n = n, min = -1, max = 1)
w3 <- runif(n = n, min = -1, max = 1)
# Random errors
rho <- 0.5
var.y <- 0.3
sd.y <- sqrt(var.y)
sigma <- matrix(c(1, rho * sd.y,
rho * sd.y, var.y),
nrow = 2)
errors <- mnorm::rmnorm(n = n, mean = c(0, 0), sigma = sigma)
u <- errors[, 1]
eps <- errors[, 2]
# Coefficients
gamma <- c(-1, 2)
beta <- c(1, -1, 1)
# Linear index
li <- gamma[1] * w1 + gamma[2] * w2
li.y <- beta[1] + beta[2] * w1 + beta[3] * w3
# Latent variable
z_star <- li + u
y_star <- li.y + eps
# Cuts
cuts <- c(-1, 0.5, 2)
# Observable ordered outcome
z <- rep(0, n)
z[(z_star > cuts[1]) & (z_star <= cuts[2])] <- 1
z[(z_star > cuts[2]) & (z_star <= cuts[3])] <- 2
z[z_star > cuts[3]] <- 3
table(z)
# Observable continuous outcome such
# that outcome 'y' is observable only
# when 'z > 1' and unobservable otherwise
# i.e. when 'z <= 1' we code 'y' as 'Inf'
y <- y_star
y[z <= 1] <- Inf
# Data
data <- data.frame(w1 = w1, w2 = w2, w3 = w3,
z = z, y = y)
# ---
# Step 2
# Estimation of parameters
# ---
# Estimation
model <- mvoprobit(z ~ w1 + w2,
y ~ w1 + w3,
data = data)
summary(model)
# Compare estimates and true values of parameters
# regression coefficients of ordered equation
cbind(true = gamma, estimate = model$coef[[1]])
# cuts
cbind(true = cuts, estimate = model$cuts[[1]])
# regression coefficients of continuous equation
cbind(true = beta, estimate = as.numeric(model$coef2[[1]]))
# variance
cbind(true = var.y, estimate = as.numeric(model$var2[[1]]))
# covariance
cbind(true = rho * sd.y, estimate = as.numeric(model$cov2[[1]]))
# ---
# Step 3
# Estimation of expectations and marginal effects
# ---
# New data
newdata <- data.frame(z = 1,
y = 1,
w1 = 0.1,
w2 = 0.2,
w3 = 0.3)
# Predict unconditional expectation of the dependent variable
predict(model, group2 = 0, newdata = newdata)
# Predict conditional expectations of the dependent variable
# E(y | z == 2)
predict(model, group = 2, group2 = 0, newdata = newdata)
# E(y | z == 0)
predict(model, group = 0, group2 = 0, newdata = newdata)
# ---
# Step 4
# Classical Heckman's two-step estimation procedure
# ---
# Predict adjusted conditional expectations
lambda2 <- predict(model, group = 2, type = "lambda")
lambda3 <- predict(model, group = 3, type = "lambda")
# Construct variable responsible for adjusted
# conditional expectation in linear regression equation
data$lambda <- NA
data$lambda[model$data$z == 2] <- lambda2[model$data$z == 2]
data$lambda[model$data$z == 3] <- lambda3[model$data$z == 3]
# Alternatively simply get this variable from the estimation output
# of a selection part of the model
model_probit <- mvoprobit(z ~ w1 + w2, data = data)
data$lambda <- model_probit$lambda
# Estimate model via classical least squares
model_lm <- lm(y ~ w1 + w3, data = data[!is.infinite(data$y), ])
summary(model_lm)
# Estimate model via two-step procedure
model_ts <- lm(y ~ w1 + w3 + lambda, data = data[!is.infinite(data$y), ])
summary(model_ts)
# Automatic estimation of two-step model with robust standard errors
model_ts <- mvoprobit(z ~ w1 + w2,
y ~ w1 + w3,
data = data,
estimator = "2step")
summary(model_ts)
# Check estimates accuracy
tbl <- cbind(true = beta,
ls = coef(model_lm),
ml = model$coef2[[1]][1, ],
twostep = model_ts$coef2[[1]][1, ])
print(tbl)
# ---
# Step 5
# Semiparametric estimation procedure
# ---
# Estimate the model using Lee's method
# assuming logistic distribution of
# random errors of the selection equation
model_Lee <- mvoprobit(z ~ w1 + w2,
y ~ w1 + w3,
data = data,
marginal = list(logistic = NULL),
estimator = "2step")
summary(model_Lee)
# One step estimation is also available as well
# as more complex marginal distributions.
# Consider random errors in selection equation
# following PGN distribution with three parameters.
model_sp <- mvoprobit(z ~ w1 + w2,
y ~ w1 + w3,
data = data,
marginal = list(PGN = 3))
summary(model_sp)
# To additionally relax normality assumption of
# random error of continuous equation it is possible
# to use Newey's two-step procedure.
model_Newey <- mvoprobit(z ~ w1 + w2,
y ~ w1 + w3,
data = data,
marginal = list(PGN = 3),
estimator = "2step",
degrees = 2,
cov_type = "nonparametric")
summary(model_Newey)
# -------------------------------
# Simulated data example 5
# Endogenous switching model
# with heteroscedastic
# ordered selection mechanism
# -------------------------------
# Load required package
library("mnorm")
# ---
# Step 1
# Simulation of data
# ---
# Set seed for reproducibility
set.seed(123)
# The number of observations
n <- 1000
# Regressors (covariates)
w1 <- runif(n = n, min = -1, max = 1)
w2 <- runif(n = n, min = -1, max = 1)
w3 <- runif(n = n, min = -1, max = 1)
# Random errors
rho_0 <- -0.8
rho_1 <- -0.7
var2_0 <- 0.9
var2_1 <- 1
sd_y_0 <- sqrt(var2_0)
sd_y_1 <- sqrt(var2_1)
cor_y_01 <- 0.7
cov2_01 <- sd_y_0 * sd_y_1 * cor_y_01
cov2_z_0 <- rho_0 * sd_y_0
cov2_z_1 <- rho_1 * sd_y_1
sigma <- matrix(c(1, cov2_z_0, cov2_z_1,
cov2_z_0, var2_0, cov2_01,
cov2_z_1, cov2_01, var2_1),
nrow = 3)
errors <- mnorm::rmnorm(n = n, mean = c(0, 0, 0), sigma = sigma)
u <- errors[, 1]
eps_0 <- errors[, 2]
eps_1 <- errors[, 3]
# Coefficients
gamma <- c(-1, 2)
gamma_het <- c(0.5, -1)
beta_0 <- c(1, -1, 1)
beta_1 <- c(2, -1.5, 0.5)
# Linear index of ordered equation
# mean
li <- gamma[1] * w1 + gamma[2] * w2
# variance
li_het <- gamma_het[1] * w2 + gamma_het[2] * w3
# Linear index of continuous equation
# regime 0
li_y_0 <- beta_0[1] + beta_0[2] * w1 + beta_0[3] * w3
# regime 1
li_y_1 <- beta_1[1] + beta_1[2] * w1 + beta_1[3] * w3
# Latent variable
z_star <- li + u * exp(li_het)
y_0_star <- li_y_0 + eps_0
y_1_star <- li_y_1 + eps_1
# Cuts
cuts <- c(-1, 0.5, 2)
# Observable ordered outcome
z <- rep(0, n)
z[(z_star > cuts[1]) & (z_star <= cuts[2])] <- 1
z[(z_star > cuts[2]) & (z_star <= cuts[3])] <- 2
z[z_star > cuts[3]] <- 3
table(z)
# Observable continuous outcome such
# that outcome 'y' is
# in regime 1 when 'z == 1',
# in regime 0 when 'z <= 1',
# unobservable when 'z == 0'
y <- rep(NA, n)
y[z == 0] <- Inf
y[z == 1] <- y_0_star[z == 1]
y[z > 1] <- y_1_star[z > 1]
# Data
data <- data.frame(w1 = w1, w2 = w2, w3 = w3,
z = z, y = y)
# ---
# Step 2
# Estimation of parameters
# ---
# Assign groups
groups <- matrix(0:3, ncol = 1)
groups2 <- matrix(c(-1, 0, 1, 1), ncol = 1)
# Estimation
model <- mvoprobit(list(z ~ w1 + w2 | w2 + w3),
list(y ~ w1 + w3),
groups = groups, groups2 = groups2,
data = data)
summary(model)
# Compare estimates and true values of parameters
# regression coefficients of ordered equation
cbind(true = gamma, estimate = model$coef[[1]])
cbind(true = gamma_het, estimate = model$coef_var[[1]])
# cuts
cbind(true = cuts, estimate = model$cuts[[1]])
# regression coefficients of continuous equation
cbind(true = beta_0, estimate = model$coef2[[1]][1, ])
cbind(true = beta_1, estimate = model$coef2[[1]][2, ])
# variances
cbind(true = c(var2_0, var2_1), estimate = model$var2[[1]])
# covariances
cbind(true = c(cov2_z_0, cov2_z_1), estimate = model$cov2[[1]])
# ---
# Step 3
# Estimation of expectations and marginal effects
# ---
# New data
newdata <- data.frame(z = 1,
y = 1,
w1 = 0.1,
w2 = 0.2,
w3 = 0.3)
# Predict unconditional expectation of the dependent variable
# regime 0
predict(model, group2 = 0, newdata = newdata)
# regime 1
predict(model, group2 = 1, newdata = newdata)
# Predict conditional expectations of the dependent variable
# given condition 'z == 0' for regime 1 i.e. E(y1 | z = 0)
predict(model, group = 0, group2 = 1, newdata = newdata)
# -------------------------------
# Simulated data example 6
# Endogenous switching model with
# multivariate heteroscedastic
# ordered selection mechanism
# -------------------------------
# Load required package
library("mnorm")
# ---
# Step 1
# Simulation of data
# ---
# Set seed for reproducibility
set.seed(123)
# The number of observations
n <- 1000
# Regressors (covariates)
w1 <- runif(n = n, min = -1, max = 1)
w2 <- runif(n = n, min = -1, max = 1)
w3 <- runif(n = n, min = -1, max = 1)
w4 <- runif(n = n, min = -1, max = 1)
# Random errors
rho_z1_z2 <- 0.5
rho_y0_z1 <- 0.6
rho_y0_z2 <- 0.7
rho_y1_z1 <- 0.65
rho_y1_z2 <- 0.75
var20 <- 0.9
var21 <- 1
sd_y0 <- sqrt(var20)
sd_y1 <- sqrt(var21)
cor_y01 <- 0.7
cov201 <- sd_y0 * sd_y1 * cor_y01
cov20_z1 <- rho_y0_z1 * sd_y0
cov21_z1 <- rho_y1_z1 * sd_y1
cov20_z2 <- rho_y0_z2 * sd_y0
cov21_z2 <- rho_y1_z2 * sd_y1
sigma <- matrix(c(1, rho_z1_z2, cov20_z1, cov21_z1,
rho_z1_z2, 1, cov20_z2, cov21_z2,
cov20_z1, cov20_z2, var20, cov201,
cov21_z1, cov21_z2, cov201, var21),
nrow = 4)
errors <- mnorm::rmnorm(n = n, mean = c(0, 0, 0, 0), sigma = sigma)
u1 <- errors[, 1]
u2 <- errors[, 2]
eps0 <- errors[, 3]
eps1 <- errors[, 4]
# Coefficients
gamma1 <- c(-1, 2)
gamma1_het <- c(0.5, -1)
gamma2 <- c(1, 1)
beta0 <- c(1, -1, 1, -1.2)
beta1 <- c(2, -1.5, 0.5, 1.2)
# Linear index of ordered equation
# mean
li1 <- gamma1[1] * w1 + gamma1[2] * w2
li2 <- gamma2[1] * w1 + gamma2[2] * w3
# variance
li1_het <- gamma1_het[1] * w2 + gamma1_het[2] * w3
# Linear index of continuous equation
# regime 0
li_y0 <- beta0[1] + beta0[2] * w1 + beta0[3] * w3 + beta0[4] * w4
# regime 1
li_y1 <- beta1[1] + beta1[2] * w1 + beta1[3] * w3 + beta1[4] * w4
# Latent variables
z1_star <- li1 + u1 * exp(li1_het)
z2_star <- li2 + u2
y0_star <- li_y0 + eps0
y1_star <- li_y1 + eps1
# Cuts
cuts1 <- c(-1, 1)
cuts2 <- c(0)
# Observable ordered outcome
# first
z1 <- rep(0, n)
z1[(z1_star > cuts1[1]) & (z1_star <= cuts1[2])] <- 1
z1[z1_star > cuts1[2]] <- 2
# second
z2 <- rep(0, n)
z2[z2_star > cuts2[1]] <- 1
table(z1, z2)
# Observable continuous outcome such
# that outcome 'y' is
# in regime 0 when 'z1 == 1',
# in regime 1 when 'z1 == 0' or 'z1 == 2',
# unobservable when 'z2 == 0'
y <- rep(NA, n)
y[z1 == 1] <- y0_star[z1 == 1]
y[z1 != 1] <- y1_star[z1 != 1]
y[z2 == 0] <- Inf
# Data
data <- data.frame(w1 = w1, w2 = w2, w3 = w3, w4 = w4,
z1 = z1, z2 = z2, y = y)
# ---
# Step 2
# Estimation of parameters
# ---
# Assign groups
groups <- matrix(c(0, 0,
0, 1,
1, 0,
1, 1,
2, 0,
2, 1),
byrow = TRUE, ncol = 2)
groups2 <- matrix(c(-1, 1, -1, 0, -1, 1), ncol = 1)
# Estimation
model <- mvoprobit(list(z1 ~ w1 + w2 | w2 + w3,
z2 ~ w1 + w3),
list(y ~ w1 + w3 + w4),
groups = groups, groups2 = groups2,
data = data)
summary(model)
# Compare estimates and true values of parameters
# regression coefficients of the first ordered equation
cbind(true = gamma1, estimate = model$coef[[1]])
cbind(true = gamma1_het, estimate = model$coef_var[[1]])
# regression coefficients of the second ordered equation
cbind(true = gamma2, estimate = model$coef[[2]])
# cuts
cbind(true = cuts1, estimate = model$cuts[[1]])
cbind(true = cuts2, estimate = model$cuts[[2]])
# regression coefficients of continuous equation
cbind(true = beta0, estimate = model$coef2[[1]][1, ])
cbind(true = beta1, estimate = model$coef2[[1]][2, ])
# variances
cbind(true = c(var20, var21), estimate = model$var2[[1]])
# covariances
cbind(true = c(cov20_z1, cov20_z2), estimate = model$cov2[[1]][1, ])
cbind(true = c(cov21_z1, cov21_z2), estimate = model$cov2[[1]][2, ])
# ---
# Step 3
# Estimation of expectations and marginal effects
# ---
# New data
newdata <- data.frame(z1 = 1,
z2 = 1,
y = 1,
w1 = 0.1,
w2 = 0.2,
w3 = 0.3,
w4 = 0.4)
# Predict unconditional expectation of the dependent variable
# regime 0
predict(model, group2 = 0, newdata = newdata)
# regime 1
predict(model, group2 = 1, newdata = newdata)
# Predict conditional expectations of the dependent variable
# E(y1 | z1 = 2, z2 = 1)
predict(model, group = c(2, 1), group2 = 1, newdata = newdata)
# Marginal effect of w3 on E(y1 | z1 = 2, z2 = 1)
predict(model, group = c(2, 1), group2 = 1, newdata = newdata, me = "w3")
# ---
# Step 4
# Two-step estimation procedure
# ---
# For comparison reasons let's estimate the model
# via least squres
# Estimate model via classical least squares for a benchmark
model.ls.0 <- lm(y ~ w1 + w3 + w4,
data = data[!is.infinite(data$y) & (data$z1 == 1), ])
model.ls.1 <- lm(y ~ w1 + w3 + w4,
data = data[!is.infinite(data$y) & (data$z1 != 1), ])
# Apply two-step procedure
model_ts <- mvoprobit(list(z1 ~ w1 + w2 | w2 + w3,
z2 ~ w1 + w3),
y ~ w1 + w3 + w4,
groups = groups, groups2 = groups2,
estimator = "2step",
data = data)
summary(model_ts)
# Use two-step procedure with logistic marginal distributions
# that is multivariate generalization of Lee's method
model_Lee <- mvoprobit(list(z1 ~ w1 + w2 | w2 + w3,
z2 ~ w1 + w3),
y ~ w1 + w3 + w4,
marginal = list(logistic = NULL, logistic = NULL),
groups = groups, groups2 = groups2,
estimator = "2step",
data = data)
# Apply multivariate generalization of Newey's method
model_Newey <- mvoprobit(list(z1 ~ w1 + w2 | w2 + w3,
z2 ~ w1 + w3),
y ~ w1 + w3 + w4,
marginal = list(logistic = NULL, logistic = NULL),
degrees = c(2, 3),
groups = groups, groups2 = groups2,
estimator = "2step",
data = data)
# Compare accuracy of the methods
# beta0
tbl <- cbind(true = beta0,
ls = coef(model.ls.0),
ml = model$coef2[[1]][1, ],
twostep = model_ts$coef2[[1]][1, ],
Lee = model_Lee$coef2[[1]][1, ],
Newey = model_Newey$coef2[[1]][1, ])
print(tbl)
# beta1
tbl <- cbind(true = beta1,
ls = coef(model.ls.1),
ml = model$coef2[[1]][2, ],
twostep = model_ts$coef2[[1]][2, ],
Lee = model_Lee$coef2[[1]][2, ],
Newey = model_Newey$coef2[[1]][2, ])
print(tbl)
# -------------------------------
# Simulated data example 7
# Endogenous multivariate
# switching model with
# multivariate heteroscedastic
# ordered selection mechanism
# -------------------------------
# Load required package
library("mnorm")
# ---
# Step 1
# Simulation of data
# ---
# Set seed for reproducibility
set.seed(123)
# The number of observations
n <- 1000
# Regressors (covariates)
w1 <- runif(n = n, min = -1, max = 1)
w2 <- runif(n = n, min = -1, max = 1)
w3 <- runif(n = n, min = -1, max = 1)
w4 <- runif(n = n, min = -1, max = 1)
w5 <- runif(n = n, min = -1, max = 1)
# Random errors
var20 <- 0.9
var21 <- 1
var_g0 <- 1.1
var_g1 <- 1.2
var_g2 <- 1.3
A <- rWishart(1, 7, diag(7))[, , 1]
B <- diag(sqrt(c(1, 1, var20, var21,
var_g0, var_g1, var_g2)))
sigma <- B %*% cov2cor(A) %*% B
errors <- mnorm::rmnorm(n = n, mean = rep(0, nrow(sigma)), sigma = sigma)
u1 <- errors[, 1]
u2 <- errors[, 2]
eps0_y <- errors[, 3]
eps1_y <- errors[, 4]
eps0_g <- errors[, 5]
eps1_g <- errors[, 6]
eps2_g <- errors[, 7]
# Coefficients
gamma1 <- c(-1, 2)
gamma1_het <- c(0.5, -1)
gamma2 <- c(1, 1)
beta0_y <- c(1, -1, 1, -1.2)
beta1_y <- c(2, -1.5, 0.5, 1.2)
beta0_g <- c(-1, 1, 1, 1)
beta1_g <- c(1, -1, 1, 1)
beta2_g <- c(1, 1, -1, 1)
# Linear index of ordered equation
# mean
li1 <- gamma1[1] * w1 + gamma1[2] * w2
li2 <- gamma2[1] * w1 + gamma2[2] * w3
# variance
li1_het <- gamma1_het[1] * w2 + gamma1_het[2] * w3
# Linear index of the first continuous equation
# regime 0
li_y0 <- beta0_y[1] + beta0_y[2] * w1 + beta0_y[3] * w3 + beta0_y[4] * w4
# regime 1
li_y1 <- beta1_y[1] + beta1_y[2] * w1 + beta1_y[3] * w3 + beta1_y[4] * w4
# Linear index of the second continuous equation
# regime 0
li_g0 <- beta0_g[1] + beta0_g[2] * w2 + beta0_g[3] * w3 + beta0_g[4] * w5
# regime 1
li_g1 <- beta1_g[1] + beta1_g[2] * w2 + beta1_g[3] * w3 + beta1_g[4] * w5
# regime 2
li_g2 <- beta2_g[1] + beta2_g[2] * w2 + beta2_g[3] * w3 + beta2_g[4] * w5
# Latent variables
z1_star <- li1 + u1 * exp(li1_het)
z2_star <- li2 + u2
y0_star <- li_y0 + eps0_y
y1_star <- li_y1 + eps1_y
g0_star <- li_g0 + eps0_g
g1_star <- li_g1 + eps1_g
g2_star <- li_g2 + eps2_g
# Cuts
cuts1 <- c(-1, 1)
cuts2 <- c(0)
# Observable ordered outcome
# first
z1 <- rep(0, n)
z1[(z1_star > cuts1[1]) & (z1_star <= cuts1[2])] <- 1
z1[z1_star > cuts1[2]] <- 2
# second
z2 <- rep(0, n)
z2[z2_star > cuts2[1]] <- 1
table(z1, z2)
# Observable continuous outcome such
# that outcome 'y' is
# in regime 0 when 'z1 == 1',
# in regime 1 when 'z1 == 0' or 'z1 == 2',
# unobservable when 'z2 == 0'
y <- rep(NA, n)
y[z1 == 1] <- y0_star[z1 == 1]
y[z1 != 1] <- y1_star[z1 != 1]
y[z2 == 0] <- Inf
#' # Observable continuous outcome such
# that outcome 'g' is
# in regime 0 when 'z1 == z2',
# in regime 1 when 'z1 > z2',
# in regime 2 when 'z1 < z2',
g <- rep(NA, n)
g[z1 == z2] <- g0_star[z1 == z2]
g[z1 > z2] <- g1_star[z1 > z2]
g[z1 < z2] <- g2_star[z1 < z2]
# Data
data <- data.frame(w1 = w1, w2 = w2, w3 = w3, w4 = w4, w5 = w5,
z1 = z1, z2 = z2, y = y, g = g)
# ---
# Step 2
# Estimation of parameters
# ---
# Assign groups
groups <- matrix(c(0, 0,
0, 1,
1, 0,
1, 1,
2, 0,
2, 1),
byrow = TRUE, ncol = 2)
groups2 <- matrix(NA, nrow = nrow(groups), ncol = 2)
groups2[groups[, 1] == 1, 1] <- 0
groups2[(groups[, 1] == 0) | (groups[, 1] == 2), 1] <- 1
groups2[groups[, 2] == 0, 1] <- -1
groups2[groups[, 1] == groups[, 2], 2] <- 0
groups2[groups[, 1] > groups[, 2], 2] <- 1
groups2[groups[, 1] < groups[, 2], 2] <- 2
cbind(groups, groups2)
# Estimation
model <- mvoprobit(list(z1 ~ w1 + w2 | w2 + w3,
z2 ~ w1 + w3),
list(y ~ w1 + w3 + w4,
g ~ w2 + w3 + w5),
groups = groups, groups2 = groups2,
data = data)
summary(model)
# Compare estimates and true values of parameters
# regression coefficients of the first ordered equation
cbind(true = gamma1, estimate = model$coef[[1]])
cbind(true = gamma1_het, estimate = model$coef_var[[1]])
# regression coefficients of the second ordered equation
cbind(true = gamma2, estimate = model$coef[[2]])
# cuts
cbind(true = cuts1, estimate = model$cuts[[1]])
cbind(true = cuts2, estimate = model$cuts[[2]])
# regression coefficients of the first continuous equation
cbind(true = beta0_y, estimate = model$coef2[[1]][1, ])
cbind(true = beta1_y, estimate = model$coef2[[1]][2, ])
# regression coefficients of the second continuous equation
cbind(true = beta0_g, estimate = model$coef2[[2]][1, ])
cbind(true = beta1_g, estimate = model$coef2[[2]][2, ])
cbind(true = beta2_g, estimate = model$coef2[[2]][3, ])
# variances
cbind(true = c(var20, var21), estimate = model$var2[[1]])
cbind(true = c(var_g0, var_g1, var_g2), estimate = model$var2[[2]])
# correlation between ordered equations
cbind(true = c(sigma[1, 2]), estimate = model$sigma[1, 2])
# covariances between continious and ordered equations
cbind(true = sigma[1:2, 3], estimate = model$cov2[[1]][1, ])
cbind(true = sigma[1:2, 4], estimate = model$cov2[[1]][2, ])
cbind(true = sigma[1:2, 5], estimate = model$cov2[[2]][1, ])
cbind(true = sigma[1:2, 6], estimate = model$cov2[[2]][2, ])
cbind(true = sigma[1:2, 7], estimate = model$cov2[[2]][3, ])
# covariances between continuous equations
cbind(true = c(sigma[4, 7], sigma[3, 5], sigma[4, 6]),
estimate = model$sigma2[[1]])
# ---
# Step 3
# Estimation of expectations and marginal effects
# ---
# New data
newdata <- data.frame(z1 = 1,
z2 = 1,
y = 1,
g = 1,
w1 = 0.1,
w2 = 0.2,
w3 = 0.3,
w4 = 0.4,
w5 = 0.5)
# Predict unconditional expectation of the dependent variable
# regime 0 for 'y' and regime 1 for 'g' i.e. E(y0), E(g1)
predict(model, group2 = c(0, 1), newdata = newdata)
# Predict conditional expectations of the dependent variable
# E(y0 | z1 = 2, z2 = 1), E(g1 | z1 = 2, z2 = 1)
predict(model, group = c(2, 1), group2 = c(0, 1), newdata = newdata)
# Marginal effect of w3 on E(y1 | z1 = 2, z2 = 1) and E(g1 | z1 = 2, z2 = 1)
predict(model, group = c(2, 1), group2 = c(0, 1),
newdata = newdata, me = "w3")