R: Errors-in-variables (EIV) linear regression

eivreg {eivtools}

R Documentation

Errors-in-variables (EIV) linear regression

Description

Fits errors-in-variables (EIV) linear regression given specified reliabilities, or a specified variance/covariance matrix for the measurement errors. For either case, it computes robust standard error estimates that allow for weighting and/or clustering.

Usage

eivreg(formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = FALSE,
contrasts = NULL, reliability = NULL, Sigma_error = NULL,
cluster_varname = NULL, df_adj = FALSE, stderr = TRUE, offset,
...)

Arguments

`formula`, `data`, `subset`, `weights`, `na.action`, `method`, `model`, `x`, `y`, `qr`	See documentation for `lm`.
`singular.ok`, `contrasts`, `offset`, `...`	See documentation for `lm`.
`reliability`	Named numeric vector giving the reliability for each error-prone covariate. If left `NULL`, `Sigma_error` must be specified.
`Sigma_error`	Named numeric matrix giving the variance/covariance matrix of the measurement errors for the error-prone covariate(s). If left `NULL`, `reliability` must be specified.
`cluster_varname`	A character variable providing the name of a variable in `data` that will be used as a clustering variable for robust standard error computation.
`df_adj`	Logical (default FALSE); if TRUE, the estimated variance/covariance matrix of the regression parameters is multiplied by `N/(N-p)`, where `N` is the number of observations used in the model fit and `p` is the number of regression parameters (including an intercept, if any).
`stderr`	Logical (default TRUE); if FALSE, does not compute estimated variance/covariance matrix of the regression parameters.

Details

Theory

The EIV estimator applies when one wishes to estimate the parameters of a linear regression of Y on (X,Z), but covariates (W,Z) are instead observed, where W = X + U for mean zero measurement error U. Additional assumptions are required about U for consistent estimation; see references for details.

The standard EIV estimator of the regression coefficients is (Q'Q - S)^{-1}Q'Y, where Q is the design matrix formed from (W,Z) and S is a matrix that adjusts Q'Q to account for elements that are distorted due to measurement error. The value of S depends on whether reliability or Sigma_error is specified. When Sigma_error is specified, S is known. When reliability is specified, S must be estimated using the marginal variances of the observed error-prone covariates.

The estimated regression coefficients are solutions to a system of estimating equations, and both the system of equations and the solutions depend on whether reliability or Sigma_error is specified. For each of these two cases, standard errors for the estimated regression coefficients are computed using standard results from M-estimation; see references. For either case, adjustments for clustering are provided if specified.

Syntax Details

Exactly one of reliability or Sigma_error must be specified in the call. Sigma_error need not be diagonal in the case of correlated measurement error across multiple error-prone covariates.

Error-prone variables must be included as linear main effects only; the current version of the code does not allow interactions among error-prone covariates, interactions of error-prone covariates with error-free covariates, or nonlinear functions of error-prone covariates. The error-prone covariates cannot be specified with any construction involving I().

The current version does not allow singular.ok=TRUE.

It is strongly encouraged to use the data argument to pass a dataframe containing all variables to be used in the regression, rather than using a matrix on the right hand side of the regression formula. In addition, if cluster_varname is specified, everything including the clustering variable must be passed as data.

If weights is specified, a weighted version of the EIV estimator is computed using operations analogous to weighted least squares in linear regression, and a standard error for this weighted estimator is computed. Weights must be positive and will be normalized inside the function to sum to the number of observations used to fit the model. Cases with missing weights will get dropped just like cases with missing covariates.

Different software packages that compute robust standard errors make different choices about degrees-of-freedom adjustments intended to improve small-sample coverage properties. The df_adj argument will inflate the estimated variance/covariance matrix of the estimated regression coefficients by N/(N-p); see Wooldridge (2002, p. 57). In addition, if cluster_varname is specified, the estimated variance/covariance matrix will be inflated by M/(M-1) where M is the number of unique clusters present in the estimation sample.

Value

An list object of class eivlm with the following components:

`coefficients`	Estimated regression coefficients from EIV model.
`residuals`	Residuals from fitted EIV model.
`rank`	Column rank of regression design matrix.
`fitted.values`	Fitted values from EIV model.
`N`	Number of observations used in fitted model.
`Sigma_error`	The measurement error covariance matrix, if supplied.
`reliability`	The vector of reliabilities, if supplied.
`relnames`	The names of the error-prone covariates.
`XpX_adj`	The cross-product matrix of the regression, adjusted for measurement error.
`varYXZ`	The maximum likelihood estimate of the covariance matrix of the outcome `Y`, the latent covariates X and the observed, error-free covariates `Z`.
`latent_resvar`	A degrees-of-freedom adjusted estimate of the residual variance of the latent regression. NOTE: this not an estimate of the residual variance of the regression on the observed covariates `(W,Z)`, but rather an estimate of the residual variance of the regression on `(X,Z)`.
`vcov`	The estimated variance/covariance matrix of the regression coefficients.
`cluster_varname`, `cluster_values`, `cluster_num`	If `cluster_varname` is specified, it is returned in the object, along with `cluster_values` providing the actual values of the clustering variable for the cases used in the fitted model, and `cluster_num`, the number of unique such clusters.
`OTHER`	The object also includes components `assign`, `df.residual`, `xlevels`, `call`, `terms`, `model` and other optional components such as `weights`, depending on the call; see `lm`. In addition, the object includes components `unadj_coefficients`, `unadj_fitted.values`, `unadj_residuals`, `unadj_effects`, and `unadj_qr` that are computed from the unadjusted regression model that ignores measurement error; see `lm`.

Author(s)

J.R. Lockwood jrlockwood@ets.org modified the lm function to adapt it for EIV regression.

References

Carroll R.J, Ruppert D., Stefanski L.A. and Crainiceanu C.M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective (2nd edition). London: Chapman & Hall.

Fuller W. (2006). Measurement Error Models (2nd edition). New York: John Wiley & Sons.

Stefanksi L.A. and Boos D.B. (2002). “The calculus of M-estimation,” The American Statistician 56(1):29-38.

Wooldridge J. (2002). Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.

Examples

set.seed(1001)
## simulate data with covariates x1, x2 and z.
.n    <- 1000
.d    <- data.frame(x1 = rnorm(.n))
.d$x2 <- sqrt(0.5)*.d$x1 + rnorm(.n, sd=sqrt(0.5))
.d$z  <- as.numeric(.d$x1 + .d$x2 > 0)

## generate outcome
## true regression parameters are c(2,1,1,-1)
.d$y  <- 2.0 + 1.0*.d$x1 + 1.0*.d$x2 - 1.0*.d$z + rnorm(.n)

## generate error-prone covariates w1 and w2
Sigma_error <- diag(c(0.20, 0.30))
dimnames(Sigma_error) <- list(c("w1","w2"), c("w1","w2"))
.d$w1 <- .d$x1 + rnorm(.n, sd = sqrt(Sigma_error["w1","w1"]))
.d$w2 <- .d$x2 + rnorm(.n, sd = sqrt(Sigma_error["w2","w2"]))

## fit EIV regression specifying known measurement error covariance matrix
.mod1 <- eivreg(y ~ w1 + w2 + z, data = .d, Sigma_error = Sigma_error)
print(class(.mod1))
.tmp <- summary(.mod1)
print(class(.tmp))
print(.tmp)

## fit EIV regression specifying known reliabilities.  Note that
## point estimator is slightly different from .mod1 because
## the correction matrix S must be estimated when the reliability
## is known.
.lambda <- c(1,1) / (c(1,1) + diag(Sigma_error))
.mod2 <- eivreg(y ~ w1 + w2 + z, data = .d, reliability = .lambda)
print(summary(.mod2))

[Package eivtools version 0.1-8 Index]