eivreg {eivtools}R Documentation

Errors-in-variables (EIV) linear regression

Description

Fits errors-in-variables (EIV) linear regression given specified reliabilities, or a specified variance/covariance matrix for the measurement errors. For either case, it computes robust standard error estimates that allow for weighting and/or clustering.

Usage

eivreg(formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = FALSE,
contrasts = NULL, reliability = NULL, Sigma_error = NULL,
cluster_varname = NULL, df_adj = FALSE, stderr = TRUE, offset,
...)

Arguments

formula, data, subset, weights, na.action, method, model, x, y, qr

See documentation for lm.

singular.ok, contrasts, offset, ...

See documentation for lm.

reliability

Named numeric vector giving the reliability for each error-prone covariate. If left NULL, Sigma_error must be specified.

Sigma_error

Named numeric matrix giving the variance/covariance matrix of the measurement errors for the error-prone covariate(s). If left NULL, reliability must be specified.

cluster_varname

A character variable providing the name of a variable in data that will be used as a clustering variable for robust standard error computation.

df_adj

Logical (default FALSE); if TRUE, the estimated variance/covariance matrix of the regression parameters is multiplied by N/(N-p), where N is the number of observations used in the model fit and p is the number of regression parameters (including an intercept, if any).

stderr

Logical (default TRUE); if FALSE, does not compute estimated variance/covariance matrix of the regression parameters.

Details

Theory

The EIV estimator applies when one wishes to estimate the parameters of a linear regression of Y on (X,Z), but covariates (W,Z) are instead observed, where W = X + U for mean zero measurement error U. Additional assumptions are required about U for consistent estimation; see references for details.

The standard EIV estimator of the regression coefficients is (Q'Q - S)^{-1}Q'Y, where Q is the design matrix formed from (W,Z) and S is a matrix that adjusts Q'Q to account for elements that are distorted due to measurement error. The value of S depends on whether reliability or Sigma_error is specified. When Sigma_error is specified, S is known. When reliability is specified, S must be estimated using the marginal variances of the observed error-prone covariates.

The estimated regression coefficients are solutions to a system of estimating equations, and both the system of equations and the solutions depend on whether reliability or Sigma_error is specified. For each of these two cases, standard errors for the estimated regression coefficients are computed using standard results from M-estimation; see references. For either case, adjustments for clustering are provided if specified.

Syntax Details

Exactly one of reliability or Sigma_error must be specified in the call. Sigma_error need not be diagonal in the case of correlated measurement error across multiple error-prone covariates.

Error-prone variables must be included as linear main effects only; the current version of the code does not allow interactions among error-prone covariates, interactions of error-prone covariates with error-free covariates, or nonlinear functions of error-prone covariates. The error-prone covariates cannot be specified with any construction involving I().

The current version does not allow singular.ok=TRUE.

It is strongly encouraged to use the data argument to pass a dataframe containing all variables to be used in the regression, rather than using a matrix on the right hand side of the regression formula. In addition, if cluster_varname is specified, everything including the clustering variable must be passed as data.

If weights is specified, a weighted version of the EIV estimator is computed using operations analogous to weighted least squares in linear regression, and a standard error for this weighted estimator is computed. Weights must be positive and will be normalized inside the function to sum to the number of observations used to fit the model. Cases with missing weights will get dropped just like cases with missing covariates.

Different software packages that compute robust standard errors make different choices about degrees-of-freedom adjustments intended to improve small-sample coverage properties. The df_adj argument will inflate the estimated variance/covariance matrix of the estimated regression coefficients by N/(N-p); see Wooldridge (2002, p. 57). In addition, if cluster_varname is specified, the estimated variance/covariance matrix will be inflated by M/(M-1) where M is the number of unique clusters present in the estimation sample.

Value

An list object of class eivlm with the following components:

coefficients

Estimated regression coefficients from EIV model.

residuals

Residuals from fitted EIV model.

rank

Column rank of regression design matrix.

fitted.values

Fitted values from EIV model.

N

Number of observations used in fitted model.

Sigma_error

The measurement error covariance matrix, if supplied.

reliability

The vector of reliabilities, if supplied.

relnames

The names of the error-prone covariates.

XpX_adj

The cross-product matrix of the regression, adjusted for measurement error.

varYXZ

The maximum likelihood estimate of the covariance matrix of the outcome Y, the latent covariates X and the observed, error-free covariates Z.

latent_resvar

A degrees-of-freedom adjusted estimate of the residual variance of the latent regression. NOTE: this not an estimate of the residual variance of the regression on the observed covariates (W,Z), but rather an estimate of the residual variance of the regression on (X,Z).

vcov

The estimated variance/covariance matrix of the regression coefficients.

cluster_varname, cluster_values, cluster_num

If cluster_varname is specified, it is returned in the object, along with cluster_values providing the actual values of the clustering variable for the cases used in the fitted model, and cluster_num, the number of unique such clusters.

OTHER

The object also includes components assign, df.residual, xlevels, call, terms, model and other optional components such as weights, depending on the call; see lm. In addition, the object includes components unadj_coefficients, unadj_fitted.values, unadj_residuals, unadj_effects, and unadj_qr that are computed from the unadjusted regression model that ignores measurement error; see lm.

Author(s)

J.R. Lockwood jrlockwood@ets.org modified the lm function to adapt it for EIV regression.

References

Carroll R.J, Ruppert D., Stefanski L.A. and Crainiceanu C.M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective (2nd edition). London: Chapman & Hall.

Fuller W. (2006). Measurement Error Models (2nd edition). New York: John Wiley & Sons.

Stefanksi L.A. and Boos D.B. (2002). “The calculus of M-estimation,” The American Statistician 56(1):29-38.

Wooldridge J. (2002). Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.

See Also

lm, summary.eivlm, deconv_npmle

Examples

set.seed(1001)
## simulate data with covariates x1, x2 and z.
.n    <- 1000
.d    <- data.frame(x1 = rnorm(.n))
.d$x2 <- sqrt(0.5)*.d$x1 + rnorm(.n, sd=sqrt(0.5))
.d$z  <- as.numeric(.d$x1 + .d$x2 > 0)

## generate outcome
## true regression parameters are c(2,1,1,-1)
.d$y  <- 2.0 + 1.0*.d$x1 + 1.0*.d$x2 - 1.0*.d$z + rnorm(.n)

## generate error-prone covariates w1 and w2
Sigma_error <- diag(c(0.20, 0.30))
dimnames(Sigma_error) <- list(c("w1","w2"), c("w1","w2"))
.d$w1 <- .d$x1 + rnorm(.n, sd = sqrt(Sigma_error["w1","w1"]))
.d$w2 <- .d$x2 + rnorm(.n, sd = sqrt(Sigma_error["w2","w2"]))

## fit EIV regression specifying known measurement error covariance matrix
.mod1 <- eivreg(y ~ w1 + w2 + z, data = .d, Sigma_error = Sigma_error)
print(class(.mod1))
.tmp <- summary(.mod1)
print(class(.tmp))
print(.tmp)

## fit EIV regression specifying known reliabilities.  Note that
## point estimator is slightly different from .mod1 because
## the correction matrix S must be estimated when the reliability
## is known.
.lambda <- c(1,1) / (c(1,1) + diag(Sigma_error))
.mod2 <- eivreg(y ~ w1 + w2 + z, data = .d, reliability = .lambda)
print(summary(.mod2))

[Package eivtools version 0.1-8 Index]