mice.impute.pls {miceadds} | R Documentation |
Imputation using Partial Least Squares for Dimension Reduction
Description
This function imputes a variable with missing values using PLS regression (Mevik & Wehrens, 2007) for a dimension reduction of the predictor space.
Usage
mice.impute.pls(y, ry, x, type, pls.facs=NULL,
pls.impMethod="pmm", donors=5, pls.impMethodArgs=NULL, pls.print.progress=TRUE,
imputationWeights=rep(1, length(y)), pcamaxcols=1E+09,
min.int.cor=0, min.all.cor=0, N.largest=0, pls.title=NULL, print.dims=TRUE,
pls.maxcols=5000, use_boot=FALSE, envir_pos=NULL, extract_data=TRUE,
remove_lindep=TRUE, derived_vars=NULL, ...)
mice.impute.2l.pls2(y, ry, x, type, pls.facs=NULL, pls.impMethod="pmm",
pls.print.progress=TRUE, imputationWeights=rep(1, length(y)), pcamaxcols=1E+09,
tricube.pmm.scale=NULL, min.int.cor=0, min.all.cor=0, N.largest=0,
pls.title=NULL, print.dims=TRUE, pls.maxcols=5000, envir_pos=parent.frame(), ...)
Arguments
y |
Incomplete data vector of length |
ry |
Vector of missing data pattern ( |
x |
Matrix ( |
type |
|
pls.facs |
Number of factors used in PLS regression. This argument can also be specified as a list defining different numbers of factors for all variables to be imputed. |
pls.impMethod |
Imputation method used for in PLS estimation.
Any imputation method can be used except if |
donors |
Number of donors if predictive mean matching is used
( |
pls.impMethodArgs |
Arguments for imputation method
|
pls.print.progress |
Print progress during PLS regression. |
imputationWeights |
Vector of sample weights to be used in imputation models. |
pcamaxcols |
Amount of variance explained by principal components (must be a number between 0 and 1) or number of factors used in PCA (an integer larger than 1). |
min.int.cor |
Minimum absolute correlation for an interaction of two predictors to be included in the PLS regression model |
min.all.cor |
Minimum absolute correlation for inclusion in the PLS regression model. |
N.largest |
Number of variable to be included which do have the largest absolute correlations. |
pls.title |
Title for progress print in console output. |
print.dims |
An optional logical indicating whether dimensions of inputs should be printed. |
pls.maxcols |
Maximum number of interactions to be created. |
use_boot |
Logical whether Bayesian bootstrap should be used for drawing regression parameters |
envir_pos |
Position of the environment from which the data should be extracted. |
extract_data |
Logical indicating whether input data should be extracted
from parent environment within |
remove_lindep |
Logical indicating whether linear dependencies should be automatically detected and some predictors are removed |
derived_vars |
Optional list containing formulas with derived variables for inclusion in PLS dimension reduction |
... |
Further arguments to be passed. |
tricube.pmm.scale |
Scale factor for tricube PMM imputation. |
Value
A vector of length nmis=sum(!ry)
with imputations
if pls.impMethod !="xplsfacs"
. In case of
pls.impMethod=="xplsfacs"
a matrix with PLS factors
is computed.
Note
The mice.impute.2l.pls2
function is just included for reasons of
backward compatibility to former miceadds versions.
References
Mevik, B. H., & Wehrens, R. (2007). The pls package: Principal component and partial least squares regression in R. Journal of Statistical Software, 18, 1-24. doi:10.18637/jss.v018.i02
Examples
## Not run:
#############################################################################
# EXAMPLE 1: PLS imputation method for internet data
#############################################################################
data(data.internet)
dat <- data.internet
# specify predictor matrix
predictorMatrix <- matrix( 1, ncol(dat), ncol(dat) )
rownames(predictorMatrix) <- colnames(predictorMatrix) <- colnames(dat)
diag( predictorMatrix) <- 0
# use PLS imputation method for all variables
impMethod <- rep( "pls", ncol(dat) )
names(impMethod) <- colnames(dat)
# define predictors for interactions (entries with type 4 in predictorMatrix)
predictorMatrix[c("IN1","IN15","IN16"),c("IN1","IN3","IN10","IN13")] <- 4
# define predictors which should appear as linear and quadratic terms (type 5)
predictorMatrix[c("IN1","IN8","IN9","IN10","IN11"),c("IN1","IN2","IN7","IN5")] <- 5
# use 9 PLS factors for all variables
pls.facs <- as.list( rep( 9, length(impMethod) ) )
names(pls.facs) <- names(impMethod)
pls.facs$IN1 <- 15 # use 15 PLS factors for variable IN1
# choose norm or pmm imputation method
pls.impMethod <- as.list( rep("norm", length(impMethod) ) )
names(pls.impMethod) <- names(impMethod)
pls.impMethod[ c("IN1","IN6")] <- "pmm"
# some arguments for imputation method
pls.impMethodArgs <- list( "IN1"=list( "donors"=10 ),
"IN2"=list( "ridge2"=1E-4 ) )
# Model 1: Three parallel chains
imp1 <- mice::mice(data=dat, method=impMethod,
m=3, maxit=5, predictorMatrix=predictorMatrix,
pls.facs=pls.facs, # number of PLS factors
pls.impMethod=pls.impMethod, # Imputation Method in PLS imputation
pls.impMethodArgs=pls.impMethodArgs, # arguments for imputation method
pls.print.progress=TRUE, ls.meth="ridge" )
summary(imp1)
# Model 2: One long chain
imp2 <- miceadds::mice.1chain(data=dat, method=impMethod,
burnin=10, iter=21, Nimp=3, predictorMatrix=predictorMatrix,
pls.facs=pls.facs, pls.impMethod=pls.impMethod,
pls.impMethodArgs=pls.impMethodArgs, ls.meth="ridge" )
summary(imp2)
# Model 3: inclusion of additional derived variables
# define derived variables for IN1
derived_vars <- list( "IN1"=~I( ifelse( IN2>IN3, IN2, IN3 ) ) + I( sin(IN2) ) )
imp3 <- miceadds::mice.1chain(data=dat, method=impMethod, derived_vars=derived_vars,
burnin=10, iter=21, Nimp=3, predictorMatrix=predictorMatrix,
pls.facs=pls.facs, pls.impMethod=pls.impMethod,
pls.impMethodArgs=pls.impMethodArgs, ls.meth="ridge" )
summary(imp3)
#*** example for using imputation function at the level of a variable
# extract first imputed dataset
imp1 <- mice::complete(imp1, action=1)
data_imp1[ is.na(dat$IN1), "IN1" ] <- NA
# define variables
y <- data_imp1$IN1
x <- data_imp1[, -1 ]
ry <- ! is.na(y)
cn <- colnames(dat)
p <- ncol(dat)
type <- rep(1,p)
names(type) <- cn
type["IN1"] <- 0
# imputation of variable 'IN1'
imp0 <- miceadds::mice.impute.pls(y=y, x=x, ry=ry, type=type, pls.facs=10, pls.impMethod="norm",
ls.meth="ridge", extract_data=FALSE )
## End(Not run)