deconv_npmle {eivtools}R Documentation

Nonparametric MLE deconvolution under heteroskedastic normal error

Description

Implements a version of the Rabe-Hesketh et al. (2003) algorithm for computing the nonparametric MLE of a univariate latent variable distribution from observed error-prone measures. Allows for normal heteroskedastic measurement error with variance that depends on the latent variable, such as with estimates of latent ability from item response theory models.

Usage

deconv_npmle(W, csem,
gridspec = list(fixed=FALSE, xmin=-5, xmax=5, numpoints=2000),
lambda = 0.0005, lltol = 1e-7, psmall = 0.00005,
discrete = FALSE, quietly = FALSE)

Arguments

W

Vector of observed measures, where W[i] is assumed to be an error-prone measure of a latent variable X[i]

csem

A function of a single variable x that returns the conditional standard deviation of W given X=x. It needs to be able to accept a vector input and return a vector of the same length as the input.

gridspec

A named list specifying the grid over which the NPMLE will be computed. It must have a logical component fixed indicating if the NPMLE grid is fixed (TRUE), or if the final grid is chosen by optimization over a candidate grid (FALSE). The default is FALSE. The remaining components of gridspec specify the grid in one of two mutually exclusive ways. In the first way, gridspec must contain elements xmin providing the minimum grid value, xmax providing the maximum grid value, and numpoints providing the desired number of points. In this case, the grid will be numpoints equally-spaced values ranging from xmin to xmax. In the second way, gridspec must contain an element grid, a numeric vector providing the actual desired grid. It must be an arbitrary sequence of increasing numeric values with no missing values.

lambda

Step size, on probability scale, in Rabe-Hesketh et al. (2003) algorithm. See reference for details.

lltol

Algorithm stops when the improvement to the log likelihood does not exceed lltol.

psmall

If a mass point in the estimated latent distribution evolves to have probability less than psmall, it gets dropped.

discrete

Not currently implemented.

quietly

If FALSE (the default), prints iteration-by-iteration progress.

Details

The assumed model is W = X + U where the conditional distribution of U given X = x is assumed to be normal with mean zero and standard deviation csem(x). The function uses W to estimate a discrete latent distribution for X that maximizes the likelihood of the observed data. The function optimizes the mass points (among a grid of candidate values) and the associated probabilities.

In the special case of homoskedastic error, the function csem must be defined such that when passed a vector of length n, it returns a vector of length n where each element is a common constant.

The default values of xmin and xmin in gridspec are generally appropriate only for a latent variable on a standardized scale with mean zero and variance one, and should be set to appropriate values given the scale of W.

Value

A list with elements

The fields of px are:

The fields of simex_varfuncs are:

Author(s)

J.R. Lockwood jrlockwood@ets.org

References

Lockwood J.R. and McCaffrey D.F. (2014). “Correcting for test score measurement error in ANCOVA models for estimating treatment effects,” Journal of Educational and Behavioral Statistics 39(1):22-52.

Lockwood J.R. and McCaffrey D.F. (2017). “Simulation-extrapolation with latent heteroskedastic variance,” Psychometrika 82(3):717-736.

Rabe-Hesketh S., Pickles A. and Skrondal A. (2003). “Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation,” Statistical Modelling 3:215-232.

See Also

testscores,eivreg

Examples

data(testscores)

## get the unique values of the lag 1 math score and CSEM
## values and approximate the CSEM function using approxfun()
tmp <- unique(testscores[,c("math_lag1","math_lag1_csem")])
print(tmp <- tmp[order(tmp$math_lag1),])

.csem <- approxfun(tmp$math_lag1, tmp$math_lag1_csem, rule=2:2)
plot(tmp$math_lag1, tmp$math_lag1_csem)
lines(tmp$math_lag1, .csem(tmp$math_lag1), col="blue")

## get NPMLE distribution of latent lag 1 math achievement
m <- deconv_npmle(W        = testscores$math_lag1,
                  csem     = .csem,
                  gridspec = list(fixed = FALSE,
                                  xmin = min(testscores$math_lag1),
                                  xmax = max(testscores$math_lag1),
                                  numpoints = 10000),
                  quietly  = TRUE)

print(m$px)

## estimated mean is approximately the mean of W, but
## the estimated variance is less than the variance of W,
## as it should be
print(c(empirical = mean(testscores$math_lag1),
        estimated = m$px$ex[1]))

print(c(empirical = var(testscores$math_lag1),
        estimated = m$px$varx[1]))

## estimated reliability of W:
print(m$reliability)

## if implementing SIMEX, simex_varfuncs provides plug-in
## options to use for the heteroskedastic error variance
## of each observed W
print(m$simex_varfuncs)

## simple "value-added" estimates of school effects on math,
## adjusting for measurement error in the lag 1 math score.
testscores$schoolid <- factor(testscores$schoolid)

meiv <- eivreg(math ~ math_lag1 + sped + frl + schoolid,
         data = testscores,
         reliability = c(math_lag1 = m$reliability),
         contrasts = list(schoolid = "contr.sum"))

print(summary(meiv))

## alternative deconvolution with fixed grid
m <- deconv_npmle(W        = testscores$math_lag1,
                  csem     = .csem,
                  gridspec = list(fixed = TRUE,
                                  xmin = min(testscores$math_lag1),
                                  xmax = max(testscores$math_lag1),
                                  numpoints = 40),
                  quietly  = TRUE)

print(m$px)


[Package eivtools version 0.1-8 Index]