| 02missing_data.frame {mi} | R Documentation |
Class "missing_data.frame"
Description
This class is similar to a data.frame but is customized for the situation in
which variables with missing data are being modeled for multiple imputation. This class primarily
consists of a list of missing_variables plus slots containing metadata indicating how the
missing_variables relate to each other. Most operations that work for a
data.frame also work for a missing_data.frame.
Usage
missing_data.frame(y, ...)
## Hidden arguments not included in the signature
## favor_ordered = TRUE, favor_positive = FALSE,
## subclass = NA_character_,
## include_missingness = TRUE, skip_correlation_check = FALSE
Arguments
y |
Usually a |
... |
Hidden arguments. The Any further arguments are passed to the |
Details
In most cases, the first step of an analysis is for a useR to call the
missing_data.frame function on a data.frame whose variables
have some NA values, which will call the missing_variable
function on each column of the data.frame and return the list
that fills the variable slot. The classes of the list elements will depend on the
nature of the column of the data.frame and various fallible heuristics. The
success rate can be enhanced by making sure that columns of the original
data.frame that are intended to be categorical variables are
(ordered if appropriate) factors with labels. Even in the best case
scenario, it will often be necessary to utlize the change function to
modify various discretionary aspects of the missing_variables in the
variables slot of the missing_data.frame. The show method for
a missing_data.frame should be utilized to get a quick overview of the
missing_variables in a missing_data.frame and recognized what needs
to be changed.
Value
The missing_data.frame constructor function returns an object of class missing_data.frame
or that inherits from the missing_data.frame class.
Objects from the Class
Objects can be created by calls of the form new("missing_data.frame", ...).
However, useRs almost always will pass a data.frame to the
missing_data.frame constructor function to produce an object of missing_data.frame class.
Slots
This section is primarily aimed at developeRs. A missing_data.frame inherits from
data.frame but has the following additional slots:
variables:Object of class
"list"and each list element is an object that inherits from themissing_variable-classno_missing:Object of class
"logical", which is a vector whose length is the same as the length of the variables slot indicating whether the correspondingmissing_variableis fully observedpatterns:Object of class
factorwhose length is equal to the number of observation and whose elements indicate the missingness pattern for that observationDIM:Object of class
"integer"of length two indicating first the number of observations and second the length of the variables slotDIMNAMES:Object of class
"list"of length two providing the appropriate number rownames and column namespostprocess:Object of class
"function"used to create additional variables from existing variables, such as interactions between twomissing_variables once their missing values have been imputed. Does not work at the momentindex:Object of class
"list"whose length is equal to the number ofmissing_variables with some missing values. Each list element is an integer vector indicating which columns of the X slot must be dropped when modeling the correspondingmissing_variableX:Object of
MatrixTypeThing-classwith rows equal to the number of observations and is loosely related to amodel.matrix. Rather than repeatedly parsing aformuladuring the multiple imputation process, this X matrix is created once and some of its columns are dropped when modeling amissing_variableutilizing the index slot. The columns of the X matrix consists of numeric representations of themissing_variables plus (by default) the unique missingness patternsweights:Object of class
"list"whose length is equal to one or the number ofmissing_variables with some missing values. Each list element is passed to the corresponding argument ofbayesglmand similar functions. In particular, some observations can be given a weight of zero, which should drop them when modeling somemissing_variablespriors:Object of class
"list"whose length is equal to the number ofmissing_variables and whose elements give appropriate values for the priors used by the model fitting function wraped by thefit_model-methods; see, e.g.,bayesglmcorrelations:Object of class
"matrix"with rows and columns equal to the length of the variables slot. Its strict upper triangle contains Spearmancorrelations between pairs of variables (ignoring missing values), and its strict lower triangle contains Squared Multiple Correlations (SMCs) between a variable and all other variables (ignoring missing values). If either a Spearman correlation or a SMC is very close to unity, there may be difficulty or error messages during the multiple imputation process.done:Object of class
"logical"of length one indicating whether the missing values have been imputedworkpath:Object of class
characterof length one indicating the path to a working directory that is used to store some objects
Methods
There are many methods that are defined for a missing_data.frame, although some are primarily intended for developers. The most relevant ones for users are:
- change
signature(data = "missing_data.frame", y = "ANY", what = "character", to = "ANY")which is used to change discretionary aspects of themissing_variables in the variables slot of a missing_data.frame- hist
signature(x = "missing_data.frame")which shows histograms of the observed variables that have missingness- image
signature(x = "missing_data.frame")which plots an image of the missingness slot to visualize the pattern of missingness whengrayscale = FALSEor the pattern of missingness in light of the observed values (grayscale = TRUE, the default)- mi
signature(y = "missing_data.frame", model = "missing")which multiply imputes the missing values- show
signature(object = "missing_data.frame")which gives an overview of the salient characteristics of themissing_variables in the variables slot of a missing_data.frame- summary
signature(object = "missing_data.frame")which produces the same result as thesummarymethod for adata.frame
There are also S3 methods for the dim, dimnames, and names
generics, which allow functions like nrow, ncol, rownames,
colnames, etc. to work as expected on missing_data.frames. Also, accessing
and changing elements for a missing_data.frame mostly works the same way as for a
data.frame
Author(s)
Ben Goodrich and Jonathan Kropko, for this version, based on earlier versions written by Yu-Sung Su, Masanao Yajima, Maria Grazia Pittau, Jennifer Hill, and Andrew Gelman.
See Also
change, missing_variable, mi,
experiment_missing_data.frame, multilevel_missing_data.frame
Examples
# STEP 0: Get data
data(CHAIN, package = "mi")
# STEP 1: Convert to a missing_data.frame
mdf <- missing_data.frame(CHAIN) # warnings about missingness patterns
show(mdf)
# STEP 2: change things
mdf <- change(mdf, y = "log_virus", what = "transformation", to = "identity")
# STEP 3: look deeper
summary(mdf)
hist(mdf)
image(mdf)
# STEP 4: impute
## Not run:
imputations <- mi(mdf)
## End(Not run)
## An example with subsetting on a fully observed variable
data(nlsyV, package = "mi")
mdfs <- missing_data.frame(nlsyV, favor_positive = TRUE, favor_ordered = FALSE, by = "first")
mdfs <- change(mdfs, y = "momed", what = "type", to = "ord")
show(mdfs)