mnprobit {switchSelection}R Documentation

Multinomial probit model

Description

This function estimates parameters of multinomial probit model and sample selection model with continuous outcome and multinomial probit selection mechanism.

Usage

mnprobit(
  formula,
  formula2 = NULL,
  data,
  regimes = NULL,
  opt_type = "optim",
  opt_args = NULL,
  start = NULL,
  estimator = "ml",
  cov_type = "sandwich",
  degrees = NULL,
  n_sim = 1000,
  n_cores = 1,
  control = NULL,
  regularization = NULL
)

Arguments

formula

an object of class "formula" corresponding to multinomial (selection) equation.

formula2

an object of class "formula" corresponding to continuous (outcome) equation.

data

data frame containing the variables in the model.

regimes

numeric vector such that regimes[i] is a regime of continuous equation when i-th alternative is observable. It should start with 0 and special value -1 undermines that continuous (outcome) equation is unobservable.

opt_type

character representing optimization function to be used. If opt_type = "optim" then optim will be used. If opt_type = "gena" then gena will be applied i.e. genetic algorithm. If opt_type = "pso" then pso will be used i.e. particle swarm optimization.

opt_args

a list of input arguments for the optimization function selected via opt_type argument. See 'Details' for information.

start

numeric vector of initial parameters' values. It will be used as a starting point for optimization purposes. It is also possible to provide an object of class 'mnprobit' then its 'par' element will be used as a starting point.

estimator

character determining estimation method. If estimator = "ml" then maximum-likelihood will be used. If estimator = "2step" then two-step estimation procedure similar to Heckman's method will be applied.

cov_type

character determining the type of covariance matrix to be returned and used for summary. If cov_type = "hessian" then negative inverse of Hessian matrix will be applied. If cov_type = "gop" then inverse of Jacobian outer products will be used. If cov_type = "sandwich" (default) then sandwich covariance matrix estimator will be applied.

degrees

vector of non-negative integers such that degrees[i] represents degree of the polynomial which elements are selectivity correction terms associated with the i-th ordered equation. See 'Details' for additional information.

n_sim

integer representing the number of GHK draws when there are more than 3 ordered equations. Otherwise alternative (much more efficient) algorithms will be used to calculate multivariate normal probabilities.

n_cores

positive integer representing the number of CPU cores used for parallel computing. If possible it is highly recommend to set it equal to the number of available physical cores especially when the system of ordered equations has 2 or 3 equations.

control

a list of control parameters. See 'Details'.

regularization

a list of control parameters for regularization. Element ridge_ind is a vector of indexes of parameters subject to regularization according to quadratic (ridge) penalty function. These indexes correspond to parameters from par output element. Set show_ind argument of summary.mnprobit to TRUE to see these indexes. Element ridge_scale is a numeric vector of weights of ridge penalty function. Element ridge_location is a numeric vector of values to be subtracted from parameters before they pass into penalty function. Elements lasso_ind, lasso_scale and lasso_location are the same but for the absolute value (lasso) penalty term.

Details

For identification purposes the following parametrization of the multinomial probit model is used:

z_{ji}^{*} = w_{i}\gamma_{j} + u_{ji}, \quad z_{Ji}^{*} = 0,

i\in\{1,2,...,n\},\quad j\in\{1,2,...,J-1\},

z_{i}=\underset{t\in\{1,...,J\}}{\text{argmax} }\text{ }z_{ti}^{*}, \qquad u_{i} = (u_{1i},u_{2i},...,u_{(J-1)i})^{T},

u_{i}\sim N\left(\begin{bmatrix}0\\ \vdots\\ 0\end{bmatrix}, \Sigma\right), i.i.d.,

\Sigma = \begin{bmatrix} 1 & \sigma_{12}& \sigma_{13} & ... & \sigma_{1(J-1)}\\ \sigma_{12} & \sigma_{2}^2 & \sigma_{23} & ... & \sigma_{2(J-1)}\\ \sigma_{13} & \sigma_{23} & \sigma_{3}^2 & ... & \sigma_{3(J-1)}\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ \sigma_{1(J-1)} & \sigma_{2(J-1)} & \sigma_{3(J-1)} & ... & \sigma_{J-1}^2 \end{bmatrix}.

Where:

Note that alternatives z_{i} should be represented in data as integers starting from 1 (not 0).

It is also possible to account for multinomial sample selection and endogenous switching. Consider a simple example. Suppose that there is a sample containing information on wages of individuals. Let's denote wages of people who are working in information technologies (IT) sector and of those who are not by y_{1i} and y_{0i} correspondingly. The effect of various characteristics x_{i} on y_{0i} and y_{1i} may differ. For example programming skills probably have a greater impact on y_{1i} than on y_{0i}. So there is different equations (regimes) for these wages:

y_{0i} = x_{i}\beta_{0} + \varepsilon_{0i}\\ y_{1i} = x_{i}\beta_{1} + \varepsilon_{1i}\\ \varepsilon_{i}=\left(\varepsilon_{0i}, \varepsilon_{1i}\right) \text{, i.i.d.}

where \beta_{0} is a vector of regression coefficients representing the effect of individual characteristics x_{i} on wage y_{0i}. Similarly for \beta_{1}.

Herewith there is non-random selection into IT sector. Suppose that z_{i}=1 if individual is working in IT sector, z_{i}=2 if individual is employed in non-IT sector, and z_{i}=3 if individual is unemployed. So observable wage is:

y_{i} = \begin{cases} y_{1i}\text{, if }z_{i} = 1\\ y_{0i}\text{, if }z_{i} = 2\\ \text{unobservable, if }z_{i} = 3 \end{cases}

It is assumed that joint distribution of u_{i} and \varepsilon_{i} is multivariate normal. To estimate parameters of this model it is necessarily to assign regimes to alternatives. Note that regime[k] corresponds to the regime of y_{i} when z_{i} = k. Therefore set regimes = c(1, 0, -1) where -1 is a special regime for endogenously omitted observations. Dependent variable y_{i} and regressors x_{i} should be provided via formula2 argument.

Note that in some applications several alternatives may have the same regime.The number of regimes will have a moderate impact on computational burden. However the function may work extremely slow when there are more than 4 alternatives.

By default the model is estimated via maximum likelihood method. However if estimator = "2step" then two-step procedure proposed by E. Kossova and B. Potanin (2022) will be used. The idea is similar to Heckman's method i.e. to estimate the following regression equation:

y_{i} = x_{i}\beta + \sum\limits_{t=1}^{J-1} \rho_{t}\sigma_{t}\sigma_{\varepsilon} \tilde{\lambda}^{(z_{i})}_{ti},

where:

\tilde{\lambda}^{(j)}_{i}=A^{(j)}\lambda^{(t)}_{i}\\ A^{(j)}_{t_{1}t_{2}} = \begin{cases} 1\text{, if } t_{1}=j\\ -1\text{, if } t_{1}<j \text{ and } t_{1}=t_{2}\\ -1\text{, if } t_{1}>j \text{ and } t_{1}=t_{2} + 1\\ 0\text{, otherwise} \end{cases}, t_{1},t_{2}\leq J-1

\lambda^{(j)}_{i}=\lambda^{(j)}_{i} \left(\tilde{z}_{1}^{(ji)},..., \tilde{z}_{J-1}^{(ji)}\right)= \nabla \ln P\left(z_{i}\right) = \nabla \ln F_{\tilde{u}^{(ji)}}\left(\tilde{z}_{1}^{(ji)},..., \tilde{z}_{J-1}^{(ji)}\right)= \left(\lambda_{1i}^{(j)},...,\lambda_{(J-1)i}^{(j)} \right),\\ \tilde{u}^{(ji)} = \left(u_{1i}-u_{ji},u_{2i}-u_{ji},..., u_{(j-1)i}-u_{ji},u_{(j+1)i}-u_{ji},..., u_{Ji}-u_{ji}\right),\\ \tilde{z}^{(ji)} = \left(w_{i}\left(\gamma_{j}-\gamma_{1}\right), w_{i}\left(\gamma_{j}-\gamma_{2}\right),..., w_{i}\left(\gamma_{j}-\gamma_{j-1}\right), w_{i}\left(\gamma_{j}-\gamma_{j+1}\right),..., w_{i}\left(\gamma_{j}-\gamma_{J}\right)\right),\\ u_{Ji}=0,\quad \gamma_{J}=\left(0,...,0\right).

Note that \tilde{\lambda} are estimated on the first step using estimates from multinomial probit model. On the second step these estimates are used as the regressors (covariates) where \beta and \rho_{t}\sigma_{t}\sigma_{\varepsilon} are estimated via least squares method. Standard errors are estimated by approach proposed by Newey (2009).

Argument degrees is similar to the same argument of mvoprobit.

Optimization always starts with optim. If opt_type = "gena" or opt_type = "pso" then gena or pso is used to proceed optimization starting from initial point provided by optim. Manual arguments to optim should be provided in a form of a list through opt_args$optim. Similarly opt_args$gena and opt_args$pso provide manual arguments to gena and pso. For example to provide Nelder-Mead optimizer to optim and restrict the number of genetic algorithm iterations to 10 make opt_args = list(optim = list(method = "Nelder-Mead"), gena = list(maxiter = 10)).

For more information on multivariate sample selection and endogenous switching models see E. Kossova and B. Potanin (2018), E. Kossova, L. Kupriianova, and B. Potanin (2020) and E. Kossova and B. Potanin (2022).

Function pmnorm is used internally for calculation of multivariate normal probabilities, densities and their derivatives.

Currently control has no input arguments intended for the users. This argument is used for some internal purposes of the package.

Value

This function returns an object of class 'mnprobit' which is a list containing the following elements:

It is highly recommended to get estimates via coef.mnprobit function.

References

W. K. Newey (2009). Two-step series estimation of sample selection models. The Econometrics Journal, vol. 12(1), pages 217-229.

E. Kossova, B. Potanin (2018). Heckman method and switching regression model multivariate generalization. Applied Econometrics, vol. 50, pages 114-143.

E. Kossova, L. Kupriianova, B. Potanin (2020). Parametric and semiparametric multivariate sample selection models estimators' accuracy: Comparative analysis on simulated data. Applied Econometrics, vol. 57, pages 119-139.

E. Kossova, B. Potanin (2022). Estimation of Gaussian multinomial endogenous switching model. Applied Econometrics, vol. 67, pages 121-143.

Examples


# -------------------------------
# CPS data example
# -------------------------------

# Set seed for reproducibility
set.seed(123)

# Upload data
data(cps)

# Prepare multinomial variable for education
cps$educ <- NA
cps$educ[cps$basic == 1] <- 1
cps$educ[cps$bachelor == 1] <- 2
cps$educ[cps$master == 1] <- 3

# Multinomial probit model for education
f_educ <- educ ~ age + I(age ^ 2) + sbachelor + smaster
model1 <- mnprobit(f_educ, data = cps)
summary(model1)

# Endogenous education treatment model
f_lwage <- lwage ~ age + I(age ^ 2) + bachelor + master + health
model2 <- mnprobit(f_educ, f_lwage, data = cps, cov_type = "gop")
summary(model2)

# Endogenous education switching model
f_lwage2 <- lwage ~ age + I(age ^ 2) + health
model3 <- mnprobit(f_educ, f_lwage2, data = cps, 
                   regimes = c(0, 1, 2), cov_type = "gop")
summary(model3)

 
# -------------------------------
# Simulated data example 1
# Multinomial probit model
# -------------------------------

# Load required package
library("mnorm")

# ---
# Step 1
# Simulation of data
# ---

# Set seed for reproducibility
set.seed(123)

# The number of observations
n <- 1000

# Regressors (covariates)
w1 <- runif(n = n, min = -1, max = 1)
w2 <- runif(n = n, min = -1, max = 1)

# Random errors
sigma.1 <- 1
sigma.2 <- 0.9
rho <- 0.7
sigma <- matrix(c(sigma.1 ^ 2,             sigma.1 * sigma.2 * rho,
                  sigma.1 * sigma.2 * rho, sigma.2 ^ 2), 
                ncol = 2, byrow = TRUE)
u <- rmnorm(n = n, mean = c(0, 0), sigma  = sigma)

# Coefficients
gamma.1 <- c(0.1, 2, 3)
gamma.2 <- c(-0.1, 3, -2)

# Linear index
z1.li <- gamma.1[1] + gamma.1[2] * w1 + gamma.1[3] * w2
z2.li <- gamma.2[1] + gamma.2[2] * w1 + gamma.2[3] * w2

# Latent variable
z1.star <- z1.li + u[, 1]
z2.star <- z2.li + u[, 2]
z3.star <- rep(0, n)

# Observable ordered outcome
z <- rep(3, n)
z[(z1.star > z2.star) & (z1.star > z3.star)] <- 1
z[(z2.star > z1.star) & (z2.star > z3.star)] <- 2
table(z)

# Data
data <- data.frame(w1 = w1, w2 = w2, z = z)

# ---
# Step 2
# Estimation of parameters
# ---

# Estimation
model <- mnprobit(z ~ w1 + w2,
                  data = data)
summary(model)

# Compare estimates and true values of parameters
  # regression coefficients
cbind(true = gamma.1, estimate = model$coef[, 1])
cbind(true = gamma.2, estimate = model$coef[, 2])
  # covariances
cbind(true = c(sigma[1, 2], sigma[2, 2]), 
      estimate = c(model$sigma[1, 2], model$sigma[2, 2]))   

# ---
# Step 3
# Estimation of probabilities and marginal effects
# ---

# For every observation in a sample predict
  # probability of 2-nd alternative i.e. P(z = 2)
prob <- predict(model, alt = 2, type = "prob")
head(prob)
  # probability of each alternative
prob <- predict(model, alt = NULL, type = "prob")
head(prob)

# Calculate mean marginal effect of w2 on P(z = 1)
mean(predict(model, alt = 1, type = "prob", me = "w2"))

# Calculate probabilities and marginal effects
# for manually provided observations.
  # new data
newdata <- data.frame(z = c(1, 1), 
                      w1 = c(0.5, 0.2), 
                      w2 = c(-0.3, 0.8))
  # probability P(z = 2)
predict(model, alt = 2, type = "prob", newdata = newdata)
  # linear index
predict(model, type = "li", newdata = newdata)  
  # marginal effect of w1 on P(z = 2)
predict(model, alt = 2, type = "prob", newdata = newdata, me = "w1")
  # marginal effect of w1 and w2 on P(z = 3)
predict(model, alt = 3, type = "prob", 
        newdata = newdata, me = c("w1", "w2"))
  # marginal effect of w2 on the linear index
predict(model, alt = 2, type = "li", newdata = newdata, me = "w2")
  # discrete marginal effect i.e. P(z = 2 | w1 = 0.5) - P(z = 2 | w1 = 0.2)
predict(model, alt = 2, type = "prob", newdata = newdata, 
        me = "w2", eps = c(0.2, 0.5))
  # adjusted conditional expectation for endogenous switching and 
  # sample selection models with continuous outcome with random error 'e'
  # E(e | z = 2) / cov(e, u)
  # where joint distribution of 'e' and 'u' determined by
  # Gaussian copula and 'e' is normally distributed
predict(model, alt = 2, type = "lambda", newdata = newdata) 
       


# -------------------------------
# Simulated data example 2
# Multinomial selection model
# -------------------------------

# Load required package
library("mnorm")

# ---
# Step 1
# Simulation of data
# ---

# Set seed for reproducibility
set.seed(123)

# The number of observations
n <- 1000

# Random errors
sd.z2 <- sqrt(0.9)
cor.z <- 0.3
sd.y0 <- sqrt(2)
cor.z1y0 <- 0.4
cor.z2y0 <- 0.7
sd.y1 <- sqrt(1.8)
cor.z1y1 <- 0.3
cor.z2y1 <- 0.6
cor.y <- 0.8
sigma <- matrix(c(
1,                cor.z * sd.z2,            cor.z1y0 * sd.y0,         cor.z1y1 * sd.y1,
cor.z * sd.z2,    sd.z2 ^ 2,                cor.z2y0 * sd.z2 * sd.y0, cor.z2y1 * sd.z2 * sd.y1,
cor.z1y0 * sd.y0, cor.z2y0 * sd.z2 * sd.y0, sd.y0 ^ 2,                cor.y * sd.y0 * sd.y1,
cor.z1y1 * sd.y1, cor.z2y1 * sd.z2 * sd.y1, cor.y * sd.y0 * sd.y1,    sd.y1 ^ 2),
                ncol = 4, byrow = TRUE)
colnames(sigma) <- c("z1", "z2", "y0", "y1")
rownames(sigma) <- colnames(sigma)

# Simulate random errors
errors <- rmnorm(n, c(0, 0, 0, 0), sigma)
u <- errors[, 1:2]
eps <- errors[, 3:4]

# Regressors (covariates)
x1 <- runif(n, -1, 1)
x2 <- runif(n, -1, 1)
x3 <- (x2 + runif(n, -1, 1)) / 2
W <- cbind(1, x1, x2)
X <- cbind(1, x1, x3)

# Coefficients
gamma <- cbind(c(0.1, 1, 1), 
               c(0.2, -1.2, 0.8))
beta <- cbind(c(1, -1, 2), 
              c(1, -2, 1))

# Linear indexes
z1.li <- W %*% gamma[, 1]
z2.li <- W %*% gamma[, 2]
y0.li <- X %*% beta[, 1]
y1.li <- X %*% beta[, 2]

# Latent variables
z1.star <- z1.li + u[, 1]
z2.star <- z2.li + u[, 2]
y0.star <- y0.li + eps[, 1]
y1.star <- y1.li + eps[, 2]

# Obvservable variable as a dummy
z1 <- (z1.star > z2.star) & (z1.star > 0)
z2 <- (z2.star > z1.star) & (z2.star > 0)
z3 <- (z1 != 1) & (z2 != 1)

# Aggregate observable variable
z <- rep(0, n)
z[z1] <- 1
z[z2] <- 2
z[z3] <- 3
table(z)

# Make unobservable values of continuous variable
y <-  rep(Inf, n)
y[z == 1] <- y0.star[z == 1]
y[z == 2] <- y1.star[z == 2]

# Data
data <- data.frame(z = z, y = y, 
                   x1 = x1, x2 = x2, x3 = x3)
            
# ---
# Step 2
# Estimation of parameters
# ---

# Maximum likelihood method
model.ml <- mnprobit(z ~ x1 + x2, 
                  y ~ x1 + x3, regimes = c(0, 1, -1),
                  data = data, cov_type = "gop")
summary(model.ml)  

# Two-step method
model.2step <- mnprobit(z ~ x1 + x2, 
                        y ~ x1 + x3, regimes = c(0, 1, -1),
                        data = data, estimator = "2step")
summary(model.2step)
       
# Semiparametric estimator using 2-nd and 3-rd level polynomials
model.snp <- mnprobit(z ~ x1 + x2, 
                      y ~ x1 + x3, regimes = c(0, 1, -1),
                      data = data, estimator = "2step",
                      degrees = c(2, 3))  
summary(model.snp)

# Simple least squares as a benchmark
model.lm0 <- lm(y ~ x1 + x3, data = data[z == 1, ]) 
model.lm1 <- lm(y ~ x1 + x3, data = data[z == 2, ])

# Compare coefficients of continuous equations
  # y0
cbind(true = beta[, 1],
      ml = model.ml$coef2[, 1],
      twostep = model.2step$coef2[, 1],
      semiparametric = model.snp$coef2[, 1],
      ls = coef(model.lm0)) 
  # y1  
cbind(true = beta[, 2],
      ml = model.ml$coef2[, 2],
      twostep = model.2step$coef2[, 2],
      semiparametric = model.snp$coef2[, 2],
      ls = coef(model.lm1)) 
      
# Compare coefficients of multinomial equations
  # 1-nd alternative        
cbind(true = gamma[, 1],
      ml = model.ml$coef[, 1],
      twostep = model.2step$coef[, 1]) 
  # 2-nd alternative        
cbind(true = gamma[, 2],
      ml = model.ml$coef[, 2],
      twostep = model.2step$coef[, 2])  
      
# Compare variances of random errors associated with
  # z2
cbind(true = sigma[2, 2], ml = model.ml$sigma[2, 2])
  # y0
cbind(true = sd.y0 ^ 2, ml = model.ml$var2[1])
  # y1
cbind(true = sd.y1 ^ 2, ml = model.ml$var2[2])

# compare covariances between
  # z1 and z2
cbind(true = cor.z * sd.z2, 
      ml = model.ml$sigma[1, 2],
      twostep = model.2step$sigma[1, 2])
  # z1 and y0
cbind(true = cor.z1y0 * sd.y0, 
      ml = model.ml$cov2[1, 1],
      twostep = model.2step$cov2[1, 1]) 
  # z2 and y0
cbind(true = cor.z2y0 * sd.y0, ml = model.ml$cov2[2, 1])   
  # z1 and y1
cbind(true = cor.z1y1 * sd.y1, ml = model.ml$cov2[1, 2]) 
  # z2 and y1
cbind(true = cor.z2y1 * sd.y1, ml = model.ml$cov2[2, 2]) 
            
# ---
# Step 3
# Predictions and marginal effects
# ---  

# Unconditional expectation E(y1) for every observation in a sample
predict(model.ml, type = "val", regime = 1, alt = NULL) 

# Marginal effect of x1 on conditional expectation E(y0|z = 2) 
# for every observation in a sample
predict(model.ml, type = "val", regime = 0, alt = 2, me = "x1")    

# Calculate predictions and marginal effects
# for manually provided observations
# using abovementioned models.
newdata <- data.frame(z = c(1, 1),
                      y = c(1, 1), 
                      x1 = c(0.5, 0.2), 
                      x2 = c(-0.3, 0.8),
                      x3 = c(0.6, -0.7))
                      
# Unconditional expectation E(y0)
predict(model.ml, type = "val", regime = 0, alt = NULL, newdata = newdata)
predict(model.2step, type = "val", regime = 0, alt = NULL, newdata = newdata)
predict(model.snp, type = "val", regime = 0, alt = NULL, newdata = newdata)

# Conditional expectation E(y1|z=3)
predict(model.ml, type = "val", regime = 1, alt = 3, newdata = newdata)
predict(model.2step, type = "val", regime = 1, alt = 3, newdata = newdata)
predict(model.snp, type = "val", regime = 1, alt = 3, newdata = newdata)

# Marginal effect of x2 on E(y0|z = 1)
predict(model.ml, type = "val", regime = 0, 
        alt = 1, me = "x2", newdata = newdata)
predict(model.2step, type = "val", regime = 0, 
        alt = 1, me = "x2", newdata = newdata)
predict(model.snp, type = "val", regime = 0, 
        alt = 1, me = "x2", newdata = newdata)

             

[Package switchSelection version 1.1.2 Index]