miive {MIIVsem} | R Documentation |
Model-implied instrumental variable (MIIV) estimation
Description
Estimate structural equation models using model-implied instrumental variables (MIIVs).
Usage
miive(
model = model,
data = NULL,
instruments = NULL,
sample.cov = NULL,
sample.mean = NULL,
sample.nobs = NULL,
sample.cov.rescale = TRUE,
estimator = "2SLS",
se = "standard",
bootstrap = 1000L,
boot.ci = "norm",
missing = "listwise",
est.only = FALSE,
var.cov = FALSE,
miiv.check = TRUE,
ordered = NULL,
sarg.adjust = "none",
overid.degree = NULL,
overid.method = "stepwise.R2"
)
Arguments
model |
A model specified using lavaan model syntax or a
|
data |
A data frame, list or environment or an object coercible
by |
instruments |
This allows user to specify the instruments for
each equation. See Details and the |
sample.cov |
Numeric matrix. A sample variance-covariance matrix. The rownames and colnames attributes must contain all the observed variable names indicated in the model syntax. |
sample.mean |
A sample mean vector. If |
sample.nobs |
Number of observations in the full data frame. |
sample.cov.rescale |
If |
estimator |
Options |
se |
If "standard", asymptotic standard errors are
computed. If |
bootstrap |
Number of bootstrap draws, if bootstrapping is used. The
default is |
boot.ci |
Method for calculating bootstrap confidence intervals.
Options are normal approximation ( |
missing |
Default is |
est.only |
If |
var.cov |
If |
miiv.check |
Default is |
ordered |
A vector of variable names to be treated as ordered factors
in generating the polychoric correlation matrix and subsequent PIV
estimates. See details on |
sarg.adjust |
Adjusment methods used to adjust the p-values associated
with the Sargan test due to multiple comparisons. Defaults is
|
overid.degree |
A numeric value indicating the degree of overidentification to be used in estimation. |
overid.method |
The method by which excess MIIVs should
be pruned to satisfy the |
Details
model
The following model syntax operators are currently supported: =~, ~, ~~ and *. See below for details on default behavior, descriptions of how to specify the scaling indicator in latent variable models, and how to impose equality constraints on the parameter estimates.
Example using Syntax Operators
In the model below, 'L1 =~ Z1 + Z2 + Z3' indicates the latent variable L1 is measured by 3 indicators, Z1, Z2, and Z3. Likewise, L2 is measured by 3 indicators, Z4, Z5, and Z6. The statement 'L1 ~ L2' specifies latent variable L1 is regressed on latent variable L2. 'Z1 ~~ Z2' indicates the error of Z2 is allowed to covary with the error of Z3. The label LA3 appended to Z3 and Z6 in the measurement model constrains the factor loadings for Z3 and Z6 to equality. For additional details on constraints see Equality Constraints and Parameter Restrictions.
model <- ' L1 =~ Z1 + Z2 + LA3*Z3 L2 =~ Z4 + Z5 + LA3*Z6 L1 ~ L2 Z2 ~~ Z3 '
Scaling Indicators
Following the lavaan model syntax, latent variables are defined using the =~ operator. For first order factors, the scaling indicator chosen is the first observed variable on the RHS of an equation. For the model below
Z1
would be chosen as the scaling indicator forL1
andZ4
would be chosen as the scaling indicator forL2
.model <- ' L1 =~ Z1 + Z2 + Z3 L2 =~ Z4 + Z5 + Z6 '
Equality Constraints and Parameter Restrictions
Within- and across-equation equality constraints on the factor loading and regression coefficients can be imposed directly in the model syntax. To specify equality constraints between different parameters equivalent labels should be prepended to the variable name using the * operator. For example, we could constrain the factor loadings for the two non-scaling indicators of
L1
to equality using the following model syntax.model <- ' L1 =~ Z1 + LA2*Z2 + LA2*Z3 L2 =~ Z4 + Z5 + Z6 '
Researchers also can constrain the factor loading and regression coefficients to specific numeric values in a similar fashion. Below we constrain the regression coefficient of
L1
onL2
to1
.model <- ' L1 =~ Z1 + Z2 + Z3 L2 =~ Z4 + Z5 + Z6 L3 =~ Z7 + Z8 + Z9 L1 ~ 1*L2 + L3 '
Higher-order Factor Models
For example, in the model below, the scaling indicator for the higher-order factor
H1
is taken to beZ1
, the scaling indicator that would have been assigned to the first lower-order factorL1
. The intercepts for lower-order latent variables are set to zero, by defaultmodel <- ' H1 =~ L1 + L2 + L3 L1 =~ Z1 + Z2 + Z3 L2 =~ Z4 + Z5 + Z6 L3 =~ Z7 + Z8 + Z9 '
Model Defaults
In addition to those relationships specified in the model syntax MIIVsem will automatically include the intercepts of any observed or latent endogenous variable. The intercepts for any scaling indicators and lower-order latent variables are set to zero by default. Covariances among exogenous latent and observed variables are included when
var.cov = TRUE
. Where appropriate the covariances of the errors of latent and observed dependent variables are automatically included in the model specification. These defaults correspond to those used by lavaan andauto = TRUE
, except that endogenous latent variable intercepts are estimated by default, and the intercepts of scaling indicators are fixed to zero.Invalid Specifications
Certain model specifications are not currently supported. For example, the scaling indicator of a latent variable is not permitted to cross-load on another latent variable. In the model below
Z1
, the scaling indicator for L1, cross-loads on the latent variableL2
. Executing a search on the model below will result in the warning: miivs: scaling indicators with a factor complexity greater than 1 are not currently supported.model <- ' L1 =~ Z1 + Z2 + Z3 L2 =~ Z4 + Z5 + Z6 + Z1 '
In addition, MIIVsem does not currently support relations where the scaling indicator of a latent variable is also the dependent variable in a regression equation. The model below would not be valid under the current algorithm.
model <- ' L1 =~ Z1 + Z2 + Z3 Z1 ~ Z4 Z4 ~ Z5 + Z6 '
instruments
To utilize this option you must first define a list of instruments using the syntax displayed below. Here, the dependent variable for each equation is listed on the LHS of the ~ operator. In the case of latent variable equations, the dependent variable is the scaling indicator associated with that variable. The instruments are then given on the RHS, separated by + signs. The instrument syntax is then encloses in single quotes. For example,
customIVs <- ' y1 ~ z1 + z2 + z3 y2 ~ z4 + z5 '
After this list is defined, set the
instruments
argument equal to the name of the list of instruments (e.g.customIVs
). Note, thatinstruments
are specified for an equation, and not for a specific endogenous variable. If only a subset of dependent variables are listed in the instruments argument, only those equations listed will be estimated. If external or auxiliary instruments (instruments not otherwise included in the model) are included themiiv.check
argument should be set toFALSE
.sample.cov
The user may provide a sample covariance matrix in lieu of raw data. The rownames and colnames must contain the observed variable names indicated in the model syntax. If
sample.cov
is notNULL
the user must also supply a vector of sample means (sample.mean
), and the number of sample observations (sample.nobs
) from which the means and covariances were calculated. If no vector of sample means is provided intercepts will not be estimated. MIIVsem does not support bootstrap standard errors or polychoric instrumental variable estimtation when the sample moments, rather than raw data, are used as input.sample.mean
A vector of length corresponding to the row and column dimensions of the
sample.cov
matrix. The names ofsample.mean
must match those in thesample.cov
. If the user supplies a covariance matrix but no vector of sample means intercepts will not be estimated.sample.cov.rescale
Default is
TRUE
. If the sample covariance matrix provided by the user should be internally rescaled by multiplying it with a factor (N-1)/N.estimator
The default estimator is
2SLS
. For equations with continuous variables only and no restrictions the estimates are identical to those described in Bollen (1996, 2001). If restrictions are present a restricted MIIV-2SLS estimator is implemented using methods similar to those described by Greene (2003) but adapted for moment based estimation. 2SLS coefficients and overidentifcation tests are constructed using the sample moments for increased computational efficiency.If an equation contains ordered categorical variables, declared in the
ordered
argument, the PIV estimator described by Bollen and Maydeu-Olivares (2007) is implemented. The PIV estimator does not currently support exogenous observed predictors of endogenous categorical variables. See details of theordered
argument for more information about the PIV estimator.se
Whense
is set to"boot"
or"bootstrap"
standard errors are computed using a nonparametric bootstrap assuming an independent random sample. Ifvar.cov = TRUE
nonceonvergence may occur and any datasets with impproper solutions will be recorded as such and discarded. Bootstrapping is implemented using the boot by resampling the observations indata
and refitting the model with the resampled data. The number of bootstrap replications is set using thebootstrap
argument, the default is1000
. Here, the standard errors are based on the standard deviation of successful bootstrap replications. Note, the Sargan test statistic is calculated from the original sample and is not a bootstrap-based estimate. Whense
is set to"standard"
standard errors for the MIIV-2SLS coefficients are calculated using analytic expressions. For equations with categorical endogenous variables, the asymptotic distribution of the coefficients is obtained via a first order expansion where the matrix of partial derivatives is evaluated at the sample polychoric correlations. For some details on these standard errors see Bollen & Maydeu-Olivares (2007, p. 315). Ifvar.cov = TRUE
only point estimates for the variance and covariance estimates are calculated. To obtain standard errors for the variance and covariance parameters we recommend settingse = "bootstrap"
. Analytic standard errors for the variance covariance parameters accounting for the first stage estimation have been derived and will be available in future releases.missing
There are two ways to handle missing data in MIIVsem. First, missing data may be handled by listwise deletion (missing = "listwise"
), In this case any row of data containing missing observation is excluded from the analysis and the sample moments are adjusted accordingly. Estimation then proceeds normally. The second option for handling missing data is through a two-stage proceduresmissing = "twostage"
where consistent estimates of the saturated populations means and covariance are obtained in the first stage. These quantities are often referred to as the "EM means" and "EM covariance matrix." In the second stage the saturated estimates are used to calculate the MIIV-2SLS structural coefficients. Bootstrap standard errors are recommended but will be computationally burdensome due to the cost of calculating the EM-based moments at each bootstrap replication.ordered
For equations containing ordered categorical variables MIIV-2SLS coefficients are estimated using the approach outlined in Bollen & Maydeu-Olivares (2007). The asymptotic distribution of the these coefficients is obtained via a first order expansion where the matrix of partial derivatives is evaluated at the sample polychoric correlations. For some details on these standard errors see Bollen & Maydeu-Olivares (2007, p. 315). Ifvar.cov = TRUE
only point estimates for the variance and covariance estimates are calculated using theDWLS
estimator in lavaan. To obtain standard errors for the variance and covariance parameters we recommend the bootstrap approach. Analytic standard errors for the variance covariance parameters in the presence of endogenous categorical variables will be available in future releases. Currently MIIVsem does not support exogenous variables in equations with categorical endogenous variables.
Sargan's Test of Overidentification
An essential ingredient in the MIIV-2SLS approach is the application of
overidentification tests when a given model specification leads to an excess
of instruments. Empirically, overidentification tests are used to evalulate
the assumption of orthogonality between the instruments and equation
residuals. Rejection of the null hypothesis implies a deficit in the logic
leading to the instrument selection. In the context of MIIV-2SLS this is the
model specification itself. By default, MIIVsem provides Sargan's
overidentification test (Sargan, 1958) for each overidentified equation in
the system. When cross-equation restrictions or missing data are present the
properties of the test are not known. When the system contains many equations
the sarg.adjust
option provides methods to adjust the p-values
associated with the Sargan test due to multiple comparisons. Defaults is
none
. For other options see p.adjust
.
References
Bollen, K. A. (1996). An Alternative 2SLS Estimator for Latent Variable Models. Psychometrika, 61, 109-121.
Bollen, K. A. (2001). Two-stage Least Squares and Latent Variable Models: Simultaneous Estimation and Robustness to Misspecifications. In R. Cudeck, S. Du Toit, and D. Sorbom (Eds.), Structural Equation Modeling: Present and Future, A Festschrift in Honor of Karl Joreskog (pp. 119-138). Lincoln, IL: Scientific Software.
Bollen, K. A., & Maydeu-Olivares, A. (2007). A Polychoric Instrumental Variable (PIV) Estimator for Structural Equation Models with Categorical Variables. Psychometrika, 72(3), 309.
Freedman, D. (1984). On Bootstrapping Two-Stage Least-Squares Estimates in Stationary Linear Models. The Annals of Statistics, 12(3), 827–842.
Greene, W. H. (2000). Econometric analysis. Upper Saddle River, N.J: Prentice Hall.
Hayashi, F. (2000). Econometrics. Princeton, NJ: Princeton University Press
Sargan, J. D. (1958). The Estimation of Economic Relationships using Instrumental Variables. Econometrica, 26(3), 393–415.
Savalei, V. (2010). Expected versus Observed Information in SEM with Incomplete Normal and Nonnormal Data. Psychological Methods, 15(4), 352–367.
Savalei, V., & Falk, C. F. (2014). Robust Two-Stage Approach Outperforms Robust Full Information Maximum Likelihood With Incomplete Nonnormal Data. Structural Equation Modeling: A Multidisciplinary Journal, 21(2), 280–302.
See Also
MIIVsemmiivs