Impute.coin {COINr} | R Documentation |
Impute a data set in a coin
Description
This imputes any NA
s in the data set specified by dset
by invoking the function f_i
and any optional arguments f_i_para
on each column at a time (if
impute_by = "column"
), or on each row at a time (if impute_by = "row"
), or by passing the entire
data frame to f_i
if impute_by = "df"
.
Usage
## S3 method for class 'coin'
Impute(
x,
dset,
f_i = NULL,
f_i_para = NULL,
impute_by = "column",
use_group = NULL,
group_level = NULL,
normalise_first = NULL,
out2 = "coin",
write_to = NULL,
disable = FALSE,
warn_on_NAs = TRUE,
...
)
Arguments
x |
A coin class object |
dset |
The name of the data set to apply the function to, which should be accessible in |
f_i |
An imputation function. See details. |
f_i_para |
Further arguments to pass to |
impute_by |
Specifies how to impute: if |
use_group |
Optional grouping variable name to pass to imputation function if this supports group imputation. |
group_level |
A level of the framework to use for grouping indicators. This is only
relevant if |
normalise_first |
Logical: if |
out2 |
Either |
write_to |
Optional character string for naming the data set in the coin. Data will be written to
|
disable |
Logical: if |
warn_on_NAs |
Logical: if |
... |
arguments passed to or from other methods. |
Details
Clearly, the function f_i
needs to be able to accept with the data class passed to it - if
impute_by
is "row"
or "column"
this will be a numeric vector, or if "df"
it will be a data
frame. Moreover, this function should return a vector or data frame identical to the vector/data frame passed to
it except for NA
values, which can be replaced. The function f_i
is not required to replace all NA
values.
COINr has several built-in imputation functions of the form i_*()
for vectors which can be called by Impute()
. See the
online documentation for more details.
When imputing row-wise, prior normalisation of the data is recommended. This is because imputation
will use e.g. the mean of the unit values over all indicators (columns). If the indicators are on
very different scales, the result will likely make no sense. If the indicators are normalised first,
more sensible results can be obtained. There are two options to pre-normalise: first is by setting
normalise_first = TRUE
- this is anyway the default if impute_by = "row"
. In this case, you also
need to supply a vector of directions. The data will then be normalised using a min-max approach
before imputation, followed by the inverse operation to return the data to the original scales.
Another approach which gives more control is to simply run Normalise()
first, and work with the
normalised data from that point onwards. In that case it is better to set normalise_first = FALSE
,
since by default if impute_by = "row"
it will be set to TRUE
.
Checks are made on the format of the data returned by imputation functions, to ensure the
type and that non-NA
values have not been inadvertently altered. This latter check is allowed
a degree of tolerance for numerical precision, controlled by the sfigs
argument. This is because
if the data frame is normalised, and/or depending on the imputation function, there may be a very
small differences. By default sfigs = 9
, meaning that the non-NA
values pre and post-imputation
are compared to 9 significant figures.
See also documentation for Impute.data.frame()
and Impute.numeric()
which are called by this function.
Value
An updated coin with imputed data set at .$Data[[write_to]]
Examples
#' # build coin
coin <- build_example_coin(up_to = "new_coin")
# impute raw data set using population groups
# output to data frame directly
Impute(coin, dset = "Raw", f_i = "i_mean_grp",
use_group = "Pop_group", out2 = "df")