deconv_npmle {eivtools} | R Documentation |
Nonparametric MLE deconvolution under heteroskedastic normal error
Description
Implements a version of the Rabe-Hesketh et al. (2003) algorithm for computing the nonparametric MLE of a univariate latent variable distribution from observed error-prone measures. Allows for normal heteroskedastic measurement error with variance that depends on the latent variable, such as with estimates of latent ability from item response theory models.
Usage
deconv_npmle(W, csem,
gridspec = list(fixed=FALSE, xmin=-5, xmax=5, numpoints=2000),
lambda = 0.0005, lltol = 1e-7, psmall = 0.00005,
discrete = FALSE, quietly = FALSE)
Arguments
W |
Vector of observed measures, where |
csem |
A function of a single variable |
gridspec |
A named list specifying the grid over which the NPMLE will be
computed. It must have a logical component |
lambda |
Step size, on probability scale, in Rabe-Hesketh et al. (2003) algorithm. See reference for details. |
lltol |
Algorithm stops when the improvement to the log likelihood does not
exceed |
psmall |
If a mass point in the estimated latent distribution evolves to have
probability less than |
discrete |
Not currently implemented. |
quietly |
If |
Details
The assumed model is W = X + U
where the conditional
distribution of U
given X = x
is assumed to be normal
with mean zero and standard deviation csem(x)
. The function
uses W
to estimate a discrete latent distribution for X
that maximizes the likelihood of the observed data. The function
optimizes the mass points (among a grid of candidate values) and the
associated probabilities.
In the special case of homoskedastic error, the function csem
must be defined such that when passed a vector of length n
, it
returns a vector of length n
where each element is a common
constant.
The default values of xmin
and xmin
in gridspec
are generally appropriate only for a latent variable on a standardized
scale with mean zero and variance one, and should be set to
appropriate values given the scale of W
.
Value
A list with elements
gridspec: Information about the initial grid
.history: Iteration-by-iteration evolution of the estimated distribution, if
gridspec$fixed
is FALSE. Otherwise it is an empty listpx: A dataframe providing the final NPMLE distribution. There are as many rows as there are mass points in the estimated distribution; fields described below
reliability: An estimate of the reliability of
W
, equal to the estimated variance ofX
divided by the sample variance ofW
simex_varfuncs: A dataframe with as many rows as there are unique values of
W
, providing estimated plug-in variance functions to use for SIMEX data generation with latent heteroskedastic error as described in Lockwood and McCaffrey (forthcoming); see references. Fields described below
The fields of px
are:
x: Location of mass point
csem: Value of function
csem
at mass pointp: probability at mass point
ll: log likelihood at solution
ex: Estimate of mean of latent distribution
varx: Estimate of variance of latent distribution
The fields of simex_varfuncs
are:
W: Unique observed values
w
ofW
gW: The square of
csem
evaluated atW = w
gEXW: The square of
csem
evaluated atE[X | W=w]
, the conditional mean ofX
givenW=w
EgXW: The conditional mean of the square of
csem
ofX
givenW=w
, equal toE[g(X) | W=w]
Author(s)
J.R. Lockwood jrlockwood@ets.org
References
Lockwood J.R. and McCaffrey D.F. (2014). “Correcting for test score measurement error in ANCOVA models for estimating treatment effects,” Journal of Educational and Behavioral Statistics 39(1):22-52.
Lockwood J.R. and McCaffrey D.F. (2017). “Simulation-extrapolation with latent heteroskedastic variance,” Psychometrika 82(3):717-736.
Rabe-Hesketh S., Pickles A. and Skrondal A. (2003). “Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation,” Statistical Modelling 3:215-232.
See Also
Examples
data(testscores)
## get the unique values of the lag 1 math score and CSEM
## values and approximate the CSEM function using approxfun()
tmp <- unique(testscores[,c("math_lag1","math_lag1_csem")])
print(tmp <- tmp[order(tmp$math_lag1),])
.csem <- approxfun(tmp$math_lag1, tmp$math_lag1_csem, rule=2:2)
plot(tmp$math_lag1, tmp$math_lag1_csem)
lines(tmp$math_lag1, .csem(tmp$math_lag1), col="blue")
## get NPMLE distribution of latent lag 1 math achievement
m <- deconv_npmle(W = testscores$math_lag1,
csem = .csem,
gridspec = list(fixed = FALSE,
xmin = min(testscores$math_lag1),
xmax = max(testscores$math_lag1),
numpoints = 10000),
quietly = TRUE)
print(m$px)
## estimated mean is approximately the mean of W, but
## the estimated variance is less than the variance of W,
## as it should be
print(c(empirical = mean(testscores$math_lag1),
estimated = m$px$ex[1]))
print(c(empirical = var(testscores$math_lag1),
estimated = m$px$varx[1]))
## estimated reliability of W:
print(m$reliability)
## if implementing SIMEX, simex_varfuncs provides plug-in
## options to use for the heteroskedastic error variance
## of each observed W
print(m$simex_varfuncs)
## simple "value-added" estimates of school effects on math,
## adjusting for measurement error in the lag 1 math score.
testscores$schoolid <- factor(testscores$schoolid)
meiv <- eivreg(math ~ math_lag1 + sped + frl + schoolid,
data = testscores,
reliability = c(math_lag1 = m$reliability),
contrasts = list(schoolid = "contr.sum"))
print(summary(meiv))
## alternative deconvolution with fixed grid
m <- deconv_npmle(W = testscores$math_lag1,
csem = .csem,
gridspec = list(fixed = TRUE,
xmin = min(testscores$math_lag1),
xmax = max(testscores$math_lag1),
numpoints = 40),
quietly = TRUE)
print(m$px)