simulatemissings {compositions} | R Documentation |
Artifical simulation of various kinds of missings/polluted data
Description
These are simulation mechanisms to check that missing techniques perform in sensible ways. They just generate additional missings of the various types in a given dataset, according to a specific process.
Usage
simulateMissings(x, dl=NULL, knownlimit=FALSE,
MARprob=0.0, MNARprob=0.0, mnarity=0.5, SZprob=0.0)
observeWithAdditiveError(x, sigma=dl/dlf, dl=sigma*dlf, dlf=3,
keepObs=FALSE, digits=NA, obsScale=1,
class="acomp")
Arguments
x |
a dataset that should get the missings |
dl |
the detection limit described in
|
knownlimit |
a boolean indicating wether the actual detection limit is still known in the dataset. |
MARprob |
the probability of occurence of 'Missings At Random' values |
MNARprob |
the probability of occurrence of 'Missings Not At Random'. The tendency is that small values have a higher probability to be missed. |
mnarity |
a number between 0 and 1 giving the strength of the influence of the actual value in becoming a MNAR. 0 means a MAR like behavior and 1 means that it is just the smallest values that is lost |
SZprob |
the probability to obtain a structural zero. This is done at random like a MAR. |
sigma |
the standard deviation of the normal distributed extra additive error |
dlf |
the distance from 0 at which a datum will be considered BDL |
keepObs |
should the (closed) data without additive error be returned as an attribute? |
digits |
rounding to be applied to the data with additive error (see Details) |
obsScale |
rounding to be applied to the data with additive error (see Details). Should be a power of 10. |
class |
class of the output object |
Details
Without any additional parameters no missings are generated. The procedure to generate MNAR affects all variables.
Function "simulateMissings" is a multipurpose simulator, where each class of missing value is treated separately, and where detection limits are specified as thresholds.
Function "observeWithAdditiveError" simulates data within a very specific
framework, where an additive error of sd=sigma
is added to the input data
x
, and BDLs are generated if a datum is less than dfl
times
sigma
. Afterwards, the resulting data are rounded as
round(data/obsScale,digits)*obsScale
, i.e. a certain observation scale
obsScale
is chosen, and at that scale, only some digits
are kept.
This framework is typical of chemical analyses, and it generates both BDLs and
pollution/rounding of (apparently) "right" data.
Value
A dataset like x
but with some additional missings.
Author(s)
K.Gerald van den Boogaart
References
van den Boogaart, K., R. Tolosana-Delgado, and M. Bren (2011). The Compositional Meaning of a Detection Limit. In Proceedings of the 4th International Workshop on Compositional Data Analysis (2011).
van den Boogaart, K.G., R. Tolosana-Delgado and M. Templ (2014) Regression with compositional response having unobserved components or below detection limit values. Statistical Modelling (in press).
See compositions.missings for more details.
See Also
Examples
data(SimulatedAmounts)
x <- acomp(sa.lognormals)
xnew <- simulateMissings(x,dl=0.05,MAR=0.05,MNAR=0.05,SZ=0.05)
acomp(xnew)
plot(missingSummary(xnew))