sel.edit {SeleMix} | R Documentation |
Influential Error Detection
Description
Computes the score function and identifies influential errors
Usage
sel.edit (y, ypred, wgt=rep(1,nrow(as.matrix(y ))),
tot=colSums(ypred * wgt), t.sel=0.01)
Arguments
y |
matrix or data frame containing the response variables |
ypred |
matrix of predicted values for y variables |
wgt |
optional vector of sampling weights (default=1) |
tot |
optional vector containing reference estimates of totals for the y variables. If omitted, it is computed as the (possibly weighted) sum of predicted values |
t.sel |
optional vector of threshold values, one for each variable, for selective editing (default=0.01) |
Details
This function ranks observations (rank
) according to the importance of their potential errors.
The order is made with respect to the global score function values (global.score
).
The function also selects the units to be edited (sel
) so that the expected residual error of
all variables is below a prefixed level of accuracy (t.sel
).
The global score (global.score
) is the maximum of the local scores computed for each variable
(y1.score, y2.score,...
).
The local scores are defined as a weighted (weights
) absolute difference between the observed
(y1, y2,...
) and the predicted values (y1.p, y2.p,...
) standardised with respect to
the reference total estimates (tot
).
The selection of the units to be edited because affected by an influential error (sel=1
) is
made according to a two-step algorithm:
1) order the observations with respect to the global.score
(decreasing order);
2) select the first k units such that, from the (k+1)th to the last observation, all the
residual errors (y1.reserr, y2.reserr,...
) for each variable are below t.sel
.
The function provides also an indicator function (y1.sel, y2.sel,...
) reporting
which variables contain an influential errors in a unit selected for the revision.
Value
sel.edit
returns a data matrix containing the following columns:
y1 , y2 , ... |
observed variables |
y1.p , y2.p , ... |
predictions of y variables |
weights |
sampling weights |
y1.score , y2.score , ... |
local scores |
global.score |
global score |
y1.reserr , y2.reserr , ... |
residual errors |
y1.sel , y2.sel , ... |
influential error flags |
rank |
rank according to global score |
sel |
1 if the observation contains an influential error, 0 otherwise |
Author(s)
M. Teresa Buglielli <bugliell@istat.it>, Ugo Guarnera <guarnera@istat.it>
References
Di Zio, M., Guarnera, U. (2013) "A Contamination Model for Selective Editing",
Journal of Official Statistics. Volume 29, Issue 4, Pages 539-555 (http://dx.doi.org/10.2478/jos-2013-0039).
Buglielli, M.T., Di Zio, M., Guarnera, U. (2010) "Use of Contamination Models for Selective Editing", European Conference on Quality in Survey Statistics Q2010, Helsinki, 4-6 May 2010.
Examples
# Example 1
# Parameter estimation with one contaminated variable and one covariate
data(ex1.data)
ml.par <- ml.est(y=ex1.data[,"Y1"], x=ex1.data[,"X1"])
# Detection of influential errors
sel <- sel.edit(y=ex1.data[,"Y1"], ypred=ml.par$ypred)
head(sel)
sum(sel[,"sel"])
# orders results for decreasing importance of score
sel.ord <- sel[order(sel[,"rank"]), ]
# adds columns to data
ex1.data <- cbind(ex1.data, tau=ml.par$tau, outlier=ml.par$outlier,
sel[,c("rank", "sel")])
# plot of data with outliers and influential errors
sel.pairs(ex1.data[,c("X1","Y1")],outl=ml.par$outlier, sel=sel[,"sel"])
# Example 2
data(ex2.data)
par.joint <- ml.est(y=ex2.data)
sel <- sel.edit(y=ex2.data, ypred=par.joint$ypred)
sel.pairs(ex2.data,outl=par.joint$outlier, sel=sel[,"sel"])