contaminate {simFrame} | R Documentation |
Contaminate data
Description
Generic function for contaminating data.
Usage
contaminate(x, control, ...)
## S4 method for signature 'data.frame,ContControl'
contaminate(x, control, i)
Arguments
x |
the data to be contaminated. |
control |
a control object of a class inheriting from the virtual class
|
i |
an integer giving the element of the slot |
... |
if |
Details
With the control classes implemented in simFrame, contamination is modeled as a two-step process. The first step is to select observations to be contaminated, the second is to model the distribution of the outliers.
In order to extend the framework by a user-defined control class
"MyContControl"
(which must extend
"VirtualContControl"
), a method
contaminate(x, control, i)
with signature
'data.frame, MyContControl'
needs to be implemented. In case the
contaminated observations need to be identified at a later stage of the
simulation, e.g., if conflicts with inserting missing values should be
avoided, a logical indicator variable ".contaminated"
should be added
to the returned data set.
Value
A data.frame
containing the contaminated data. In addition, the
column ".contaminated"
, which consists of logicals indicating the
contaminated observations, is added to the data.frame
.
Methods
x = "data.frame", control = "character"
contaminate data using a control class specified by the character string
control
. The slots of the control object may be supplied as additional arguments.x = "data.frame", control = "ContControl"
contaminate data as defined by the control object
control
.x = "data.frame", control = "missing"
contaminate data using a control object of class
"ContControl"
. Its slots may be supplied as additional arguments.
Note
Since version 0.3, contaminate
no longer checks if the auxiliary
variable with probability weights are numeric and contain only finite positive
values (sample
still throws an error in these cases). This has
been removed to improve computational performance in simulation studies.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
Alfons, A., Templ, M. and Filzmoser, P. (2010) Contamination Models in the R Package simFrame for Statistical Simulation. In Aivazian, S., Filzmoser, P. and Kharin, Y. (editors) Computer Data Analysis and Modeling: Complex Stochastic Data and Systems, volume 2, 178–181. Minsk. ISBN 978-985-476-848-9.
Béguin, C. and Hulliger, B. (2008) The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data. Survey Methodology, 34(1), 91–103.
Hulliger, B. and Schoch, T. (2009) Robust Multivariate Imputation with Survey Data. 57th Session of the International Statistical Institute, Durban.
See Also
"DCARContControl"
, "DARContControl"
,
"ContControl"
, "VirtualContControl"
Examples
## distributed completely at random
data(eusilcP)
sam <- draw(eusilcP[, c("id", "eqIncome")], size = 20)
# using a control object
dcarc <- ContControl(target = "eqIncome", epsilon = 0.05,
dots = list(mean = 5e+05, sd = 10000), type = "DCAR")
contaminate(sam, dcarc)
# supply slots of control object as arguments
contaminate(sam, target = "eqIncome", epsilon = 0.05,
dots = list(mean = 5e+05, sd = 10000))
## distributed at random
foo <- generate(size = 10, distribution = rnorm,
dots = list(mean = 0, sd = 2))
# using a control object
darc <- DARContControl(target = "V1",
epsilon = 0.2, fun = function(x) x * 100)
contaminate(foo, darc)
# supply slots of control object as arguments
contaminate(foo, "DARContControl", target = "V1",
epsilon = 0.2, fun = function(x) x * 100)