pred.y {SeleMix} | R Documentation |
Prediction of y variables
Description
Provides predictions of y variables according to a Gaussian contamination model
Usage
pred.y (y, x=NULL, B, sigma, lambda, w, model="LN", t.outl=0.5)
Arguments
y |
matrix or data frame containing the response variables |
x |
optional matrix or data frame containing the error free covariates |
B |
matrix of regression coefficients |
sigma |
covariance matrix |
lambda |
variance inflation factor |
w |
proportion of erroneous data |
model |
data distribution: LN = lognormal(default), N=normal |
t.outl |
threshold value for posterior probabilities of identifying outliers (default=0.5) |
Details
This function provides expected values of a set of variables (y1.p,y2.p,...
) according
to a mixture of two regression models with Gaussian residuals (see ml.est
). If no covariates are available
(x
variables), a two component Gaussian mixture is used. Expected values (predictions) are computed
on the base of a set of parameters of appropriate dimensions (B, sigma, lambda,w
) and (possibly)
a matrix (or data frame) containing the error-free x
variables.
Missing values in the x
variables are not allowed. However, robust predictions of y
variables are
also provided when these variables are not observed. A vector of missing pattern (pattern
) indicates
which item is observed and which is missing.
For each unit in the data set the posterior probability of being erroneous (tau
) is computed and a
flag (outlier
) is provided taking value 0 or 1 depending on whether tau
is greater than the user
specified threshold (t.outl
).
Value
pred.y
returns a data frame containing the following columns:
y1.p , y2.p , ... |
predicted values for y variables |
tau |
posterior probabilities of being contaminated |
outlier |
1 if the observation is classified as an outlier, 0 otherwise |
pattern |
non-response patterns for y variables: 0 = missing, 1 = present value |
Author(s)
M. Teresa Buglielli <bugliell@istat.it>, Ugo Guarnera <guarnera@istat.it>
References
Buglielli, M.T., Di Zio, M., Guarnera, U. (2010) "Use of Contamination Models for Selective Editing", European Conference on Quality in Survey Statistics Q2010, Helsinki, 4-6 May 2010
Examples
# Parameter estimation with one contaminated variable and one covariate
data(ex1.data)
# Parameters estimated applying ml.est to \code{ex1.data}
B1 <- as.matrix(c(-0.152, 1.215))
sigma1 <- as.matrix(1.25)
lambda1 <- 15.5
w1 <- 0.0479
# Variable prediction
ypred <- pred.y (y=ex1.data[,"Y1"], x=ex1.data[,"X1"], B=B1,
sigma=sigma1, lambda=lambda1, w=w1, model="LN", t.outl=0.5)
# Plot ypred vs Y1
sel.pairs(cbind(ypred[,1,drop=FALSE],ex1.data[,"Y1",drop=FALSE]),
outl=ypred[,"outlier"])