eivreg {eivtools} | R Documentation |
Errors-in-variables (EIV) linear regression
Description
Fits errors-in-variables (EIV) linear regression given specified reliabilities, or a specified variance/covariance matrix for the measurement errors. For either case, it computes robust standard error estimates that allow for weighting and/or clustering.
Usage
eivreg(formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = FALSE,
contrasts = NULL, reliability = NULL, Sigma_error = NULL,
cluster_varname = NULL, df_adj = FALSE, stderr = TRUE, offset,
...)
Arguments
formula , data , subset , weights , na.action , method , model , x , y , qr |
See documentation for |
singular.ok , contrasts , offset , ... |
See documentation for |
reliability |
Named numeric vector giving the reliability for each error-prone
covariate. If left |
Sigma_error |
Named numeric matrix giving the variance/covariance matrix of the
measurement errors for the error-prone covariate(s). If left
|
cluster_varname |
A character variable providing the name of a variable in |
df_adj |
Logical (default FALSE); if TRUE, the estimated variance/covariance
matrix of the regression parameters is multiplied by |
stderr |
Logical (default TRUE); if FALSE, does not compute estimated variance/covariance matrix of the regression parameters. |
Details
Theory
The EIV estimator applies when one wishes to estimate the parameters
of a linear regression of Y
on (X,Z)
, but
covariates (W,Z)
are instead observed, where W = X +
U
for mean zero measurement error U
. Additional
assumptions are required about U
for consistent estimation;
see references for details.
The standard EIV estimator of the regression coefficients is (Q'Q
- S)^{-1}Q'Y
, where Q
is the design
matrix formed from (W,Z)
and S
is a matrix that
adjusts Q'Q
to account for elements that are distorted due
to measurement error. The value of S
depends on whether
reliability
or Sigma_error
is specified. When
Sigma_error
is specified, S
is known. When
reliability
is specified, S
must be estimated using
the marginal variances of the observed error-prone covariates.
The estimated regression coefficients are solutions to a system of
estimating equations, and both the system of equations and the
solutions depend on whether reliability
or Sigma_error
is specified. For each of these two cases, standard errors for the
estimated regression coefficients are computed using standard results
from M-estimation; see references. For either case, adjustments for
clustering are provided if specified.
Syntax Details
Exactly one of reliability
or Sigma_error
must be
specified in the call. Sigma_error
need not be diagonal in the
case of correlated measurement error across multiple error-prone
covariates.
Error-prone variables must be included as linear main effects only; the
current version of the code does not allow interactions among
error-prone covariates, interactions of error-prone covariates with
error-free covariates, or nonlinear functions of error-prone
covariates. The error-prone covariates cannot be specified with any
construction involving I()
.
The current version does not allow singular.ok=TRUE
.
It is strongly encouraged to use the data
argument to pass a dataframe
containing all variables to be used in the regression, rather than
using a matrix on the right hand side of the regression formula. In
addition, if cluster_varname
is specified, everything including
the clustering variable must be passed as data
.
If weights
is specified, a weighted version of the EIV
estimator is computed using operations analogous to weighted least
squares in linear regression, and a standard error for this weighted
estimator is computed. Weights must be positive and will be
normalized inside the function to sum to the number of observations
used to fit the model. Cases with missing weights will get dropped
just like cases with missing covariates.
Different software packages that compute robust standard errors make
different choices about degrees-of-freedom adjustments intended to
improve small-sample coverage properties. The df_adj
argument
will inflate the estimated variance/covariance matrix of the estimated
regression coefficients by N/(N-p)
; see Wooldridge (2002, p. 57). In
addition, if cluster_varname
is specified, the estimated
variance/covariance matrix will be inflated by M/(M-1)
where
M
is the number of unique clusters present in the estimation sample.
Value
An list object of class eivlm
with the following components:
coefficients |
Estimated regression coefficients from EIV model. |
residuals |
Residuals from fitted EIV model. |
rank |
Column rank of regression design matrix. |
fitted.values |
Fitted values from EIV model. |
N |
Number of observations used in fitted model. |
Sigma_error |
The measurement error covariance matrix, if supplied. |
reliability |
The vector of reliabilities, if supplied. |
relnames |
The names of the error-prone covariates. |
XpX_adj |
The cross-product matrix of the regression, adjusted for measurement error. |
varYXZ |
The maximum likelihood estimate of the covariance matrix
of the outcome |
latent_resvar |
A degrees-of-freedom adjusted estimate of the
residual variance of the latent regression. NOTE: this not an
estimate of the residual variance of the regression on the observed
covariates |
vcov |
The estimated variance/covariance matrix of the regression coefficients. |
cluster_varname , cluster_values , cluster_num |
If
|
OTHER |
The object also includes components |
Author(s)
J.R. Lockwood jrlockwood@ets.org modified the lm
function to adapt it for EIV regression.
References
Carroll R.J, Ruppert D., Stefanski L.A. and Crainiceanu C.M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective (2nd edition). London: Chapman & Hall.
Fuller W. (2006). Measurement Error Models (2nd edition). New York: John Wiley & Sons.
Stefanksi L.A. and Boos D.B. (2002). “The calculus of M-estimation,” The American Statistician 56(1):29-38.
Wooldridge J. (2002). Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.
See Also
lm
, summary.eivlm
, deconv_npmle
Examples
set.seed(1001)
## simulate data with covariates x1, x2 and z.
.n <- 1000
.d <- data.frame(x1 = rnorm(.n))
.d$x2 <- sqrt(0.5)*.d$x1 + rnorm(.n, sd=sqrt(0.5))
.d$z <- as.numeric(.d$x1 + .d$x2 > 0)
## generate outcome
## true regression parameters are c(2,1,1,-1)
.d$y <- 2.0 + 1.0*.d$x1 + 1.0*.d$x2 - 1.0*.d$z + rnorm(.n)
## generate error-prone covariates w1 and w2
Sigma_error <- diag(c(0.20, 0.30))
dimnames(Sigma_error) <- list(c("w1","w2"), c("w1","w2"))
.d$w1 <- .d$x1 + rnorm(.n, sd = sqrt(Sigma_error["w1","w1"]))
.d$w2 <- .d$x2 + rnorm(.n, sd = sqrt(Sigma_error["w2","w2"]))
## fit EIV regression specifying known measurement error covariance matrix
.mod1 <- eivreg(y ~ w1 + w2 + z, data = .d, Sigma_error = Sigma_error)
print(class(.mod1))
.tmp <- summary(.mod1)
print(class(.tmp))
print(.tmp)
## fit EIV regression specifying known reliabilities. Note that
## point estimator is slightly different from .mod1 because
## the correction matrix S must be estimated when the reliability
## is known.
.lambda <- c(1,1) / (c(1,1) + diag(Sigma_error))
.mod2 <- eivreg(y ~ w1 + w2 + z, data = .d, reliability = .lambda)
print(summary(.mod2))