delete_MAR_rank {missMethods} | R Documentation |
Create MAR values using a ranking mechanism
Description
Create missing at random (MAR) values using a ranking mechanism in a data frame or a matrix
Usage
delete_MAR_rank(
ds,
p,
cols_mis,
cols_ctrl,
n_mis_stochastic = FALSE,
ties.method = "average",
miss_cols,
ctrl_cols
)
Arguments
ds |
A data frame or matrix in which missing values will be created. |
p |
A numeric vector with length one or equal to length |
cols_mis |
A vector of column names or indices of columns in which missing values will be created. |
cols_ctrl |
A vector of column names or indices of columns, which
controls the creation of missing values in |
n_mis_stochastic |
Logical, should the number of missing values be
stochastic? If |
ties.method |
How ties are handled. Passed to |
miss_cols |
Deprecated, use |
ctrl_cols |
Deprecated, use |
Details
This function creates missing at random (MAR) values in the columns
specified by the argument cols_mis
.
The probability for missing values is controlled by p
.
If p
is a single number, then the overall probability for a value to
be missing will be p
in all columns of cols_mis
.
(Internally p
will be replicated to a vector of the same length as
cols_mis
.
So, all p[i]
in the following sections will be equal to the given
single number p
.)
Otherwise, p
must be of the same length as cols_mis
.
In this case, the overall probability for a value to be missing will be
p[i]
in the column cols_mis[i]
.
The position of the missing values in cols_mis[i]
is controlled by
cols_ctrl[i]
.
The following procedure is applied for each pair of cols_ctrl[i]
and
cols_mis[i]
to determine the positions of missing values:
At first, the probability for a value to be missing is calculated. This
probability for a missing value in a row of cols_mis[i]
is
proportional to the rank of the value in cols_ctrl[i]
in the same row.
If n_mis_stochastic = FALSE
these probabilities are given to the
prob
argument of sample
. If n_mis_stochastic
= TRUE
, they are scaled to sum up to nrow(ds) * p[i]
. Then for each
probability a uniformly distributed random number is generated. If this
random number is less than the probability, the value in cols_mis[i]
is set NA
.
The ranks are calculated via rank
.
The argument ties.method
is directly passed to this function.
Possible choices for ties.method
are documented in
rank
.
For high values of p
it is mathematically not possible to get
probabilities proportional to the ranks. In this case, a warning is given.
This warning can be silenced by setting the option
missMethods.warn.too.high.p
to false.
Value
An object of the same class as ds
with missing values.
References
Santos, M. S., Pereira, R. C., Costa, A. F., Soares, J. P., Santos, J., & Abreu, P. H. (2019). Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access, 7, 11651-11667
See Also
Other functions to create MAR:
delete_MAR_1_to_x()
,
delete_MAR_censoring()
,
delete_MAR_one_group()
Examples
ds <- data.frame(X = 1:20, Y = 101:120)
delete_MAR_rank(ds, 0.2, "X", "Y")