02missing_data.frame {mi} | R Documentation |
Class "missing_data.frame"
Description
This class is similar to a data.frame
but is customized for the situation in
which variables with missing data are being modeled for multiple imputation. This class primarily
consists of a list of missing_variable
s plus slots containing metadata indicating how the
missing_variable
s relate to each other. Most operations that work for a
data.frame
also work for a missing_data.frame.
Usage
missing_data.frame(y, ...)
## Hidden arguments not included in the signature
## favor_ordered = TRUE, favor_positive = FALSE,
## subclass = NA_character_,
## include_missingness = TRUE, skip_correlation_check = FALSE
Arguments
y |
Usually a |
... |
Hidden arguments. The Any further arguments are passed to the |
Details
In most cases, the first step of an analysis is for a useR to call the
missing_data.frame
function on a data.frame
whose variables
have some NA
values, which will call the missing_variable
function on each column of the data.frame
and return the list
that fills the variable slot. The classes of the list elements will depend on the
nature of the column of the data.frame
and various fallible heuristics. The
success rate can be enhanced by making sure that columns of the original
data.frame
that are intended to be categorical variables are
(ordered if appropriate) factor
s with labels. Even in the best case
scenario, it will often be necessary to utlize the change
function to
modify various discretionary aspects of the missing_variable
s in the
variables slot of the missing_data.frame. The show
method for
a missing_data.frame should be utilized to get a quick overview of the
missing_variable
s in a missing_data.frame and recognized what needs
to be change
d.
Value
The missing_data.frame
constructor function returns an object of class missing_data.frame
or that inherits from the missing_data.frame
class.
Objects from the Class
Objects can be created by calls of the form new("missing_data.frame", ...)
.
However, useRs almost always will pass a data.frame
to the
missing_data.frame constructor function to produce an object of missing_data.frame class.
Slots
This section is primarily aimed at developeRs. A missing_data.frame inherits from
data.frame
but has the following additional slots:
variables
:Object of class
"list"
and each list element is an object that inherits from themissing_variable-class
no_missing
:Object of class
"logical"
, which is a vector whose length is the same as the length of the variables slot indicating whether the correspondingmissing_variable
is fully observedpatterns
:Object of class
factor
whose length is equal to the number of observation and whose elements indicate the missingness pattern for that observationDIM
:Object of class
"integer"
of length two indicating first the number of observations and second the length of the variables slotDIMNAMES
:Object of class
"list"
of length two providing the appropriate number rownames and column namespostprocess
:Object of class
"function"
used to create additional variables from existing variables, such as interactions between twomissing_variable
s once their missing values have been imputed. Does not work at the momentindex
:Object of class
"list"
whose length is equal to the number ofmissing_variable
s with some missing values. Each list element is an integer vector indicating which columns of the X slot must be dropped when modeling the correspondingmissing_variable
X
:Object of
MatrixTypeThing-class
with rows equal to the number of observations and is loosely related to amodel.matrix
. Rather than repeatedly parsing aformula
during the multiple imputation process, this X matrix is created once and some of its columns are dropped when modeling amissing_variable
utilizing the index slot. The columns of the X matrix consists of numeric representations of themissing_variable
s plus (by default) the unique missingness patternsweights
:Object of class
"list"
whose length is equal to one or the number ofmissing_variable
s with some missing values. Each list element is passed to the corresponding argument ofbayesglm
and similar functions. In particular, some observations can be given a weight of zero, which should drop them when modeling somemissing_variable
spriors
:Object of class
"list"
whose length is equal to the number ofmissing_variable
s and whose elements give appropriate values for the priors used by the model fitting function wraped by thefit_model-methods
; see, e.g.,bayesglm
correlations
:Object of class
"matrix"
with rows and columns equal to the length of the variables slot. Its strict upper triangle contains Spearmancor
relations between pairs of variables (ignoring missing values), and its strict lower triangle contains Squared Multiple Correlations (SMCs) between a variable and all other variables (ignoring missing values). If either a Spearman correlation or a SMC is very close to unity, there may be difficulty or error messages during the multiple imputation process.done
:Object of class
"logical"
of length one indicating whether the missing values have been imputedworkpath
:Object of class
character
of length one indicating the path to a working directory that is used to store some objects
Methods
There are many methods that are defined for a missing_data.frame, although some are primarily intended for developers. The most relevant ones for users are:
- change
signature(data = "missing_data.frame", y = "ANY", what = "character", to = "ANY")
which is used to change discretionary aspects of themissing_variable
s in the variables slot of a missing_data.frame- hist
signature(x = "missing_data.frame")
which shows histograms of the observed variables that have missingness- image
signature(x = "missing_data.frame")
which plots an image of the missingness slot to visualize the pattern of missingness whengrayscale = FALSE
or the pattern of missingness in light of the observed values (grayscale = TRUE
, the default)- mi
signature(y = "missing_data.frame", model = "missing")
which multiply imputes the missing values- show
signature(object = "missing_data.frame")
which gives an overview of the salient characteristics of themissing_variable
s in the variables slot of a missing_data.frame- summary
signature(object = "missing_data.frame")
which produces the same result as thesummary
method for adata.frame
There are also S3 methods for the dim
, dimnames
, and names
generics, which allow functions like nrow
, ncol
, rownames
,
colnames
, etc. to work as expected on missing_data.frame
s. Also, accessing
and changing elements for a missing_data.frame
mostly works the same way as for a
data.frame
Author(s)
Ben Goodrich and Jonathan Kropko, for this version, based on earlier versions written by Yu-Sung Su, Masanao Yajima, Maria Grazia Pittau, Jennifer Hill, and Andrew Gelman.
See Also
change
, missing_variable
, mi
,
experiment_missing_data.frame
, multilevel_missing_data.frame
Examples
# STEP 0: Get data
data(CHAIN, package = "mi")
# STEP 1: Convert to a missing_data.frame
mdf <- missing_data.frame(CHAIN) # warnings about missingness patterns
show(mdf)
# STEP 2: change things
mdf <- change(mdf, y = "log_virus", what = "transformation", to = "identity")
# STEP 3: look deeper
summary(mdf)
hist(mdf)
image(mdf)
# STEP 4: impute
## Not run:
imputations <- mi(mdf)
## End(Not run)
## An example with subsetting on a fully observed variable
data(nlsyV, package = "mi")
mdfs <- missing_data.frame(nlsyV, favor_positive = TRUE, favor_ordered = FALSE, by = "first")
mdfs <- change(mdfs, y = "momed", what = "type", to = "ord")
show(mdfs)