imputeada {denoiseR} | R Documentation |
Adaptive Shrinkage with missing values - Imputation
Description
This function estimates a low-rank signal from a noisy Gaussian incomplete data using the iterative Adaptive Trace Norm (ATN) algorithm. It can be used to impute a data set. dl = dl * max(1-(lambda/dl)^gamma,0). If, the parameters lambda and gamma are not specified, they are estimated by minimizing a Missing Stein unbiased risk estimate (SURE) when the variance sigma^2 of the noise is known or a generalized SURE (GSURE) otherwise. These SURE and GSURE for missing values are implemented using finite differences.
Usage
imputeada(X, lambda = NA, gamma = NA, sigma = NA, method = c("GSURE",
"SURE"), gamma.seq = seq(1, 5, by = 0.1), method.optim = "BFGS",
center = "TRUE", scale = "FALSE", threshold = 1e-08, nb.init = 1,
maxiter = 1000, lambda0 = NA)
Arguments
X |
a data frame or a matrix with numeric entries |
lambda |
integer, value to be used in the iterative ATN algorithm |
gamma |
integer, value to be used in the iterative ATN algorithm |
sigma |
integer, standard deviation of the Gaussian noise. |
method |
to select the two tunning parameters lambda and gamma. By default by minimizing GSURE |
gamma.seq |
a vector for the sequence of gamma. The values must be greater than 1 |
method.optim |
the method used in the optim function. By default BFGS |
center |
boolean, to center the data. By default "TRUE" |
scale |
boolean, to scale the data. By default "FALSE" |
threshold |
for assessing convergence (difference between two successive iterations) |
nb.init |
integer, to run the iterative ATN algorithm with nbinit different initialization. By default 1. |
maxiter |
integer, maximum number of iterations of the iterative imputation algorithm |
lambda0 |
integer, the initial value for lambda used to optimize SURE and GSURE. By default the median of the singular values (must be in log scale) |
Details
The iterative ATN algorithm first consists in imputing missing values with initial values. Then, adashrink is performed on the completed dataset with its regularization parameter lambda and gamma. The missing entries are imputed with the estimated signal. These steps of estimation of the signal via adashrink and imputation of the missing values are iterated until convergence. At the end, both an estimation of the signal and a completed data set are provided. If lambda and gamma are not known, they can be estimated by minimizing SURE when sigma^2 is known. To do this, a grid for gamma is defined in gamma.seq (gammas must be greater than 1) and the Missing SURE function is optimized on lambda using the optim function of the package stats (?optim) with the optimization method by default sets to "BFGS". The initial lambda can be modified in the argument lambda0. When sigma is not known, it is possible to estimate the two tuning parameters by minimizing Missing GSURE. Note that Missing SURE is defined using finite differences so it is computationally costly. The estimated low rank matrix is given in the output mu.hat. imputeada automatically estimates the rank of the signal. Its value is given in the output nb.eigen corresponding to the number of non-zero eigenvalues.
Value
mu.hat the estimator of the signal
completeObs the completed data set. Observed values are the same but missing values are replaced by the estimated one in mu.hat
nb.eigen the number of non-zero singular values
gamma the given gamma or the optimal gamma selected by minimizing SURE or GSURE
lambda the given lambda or the optimal lambda selected by minimizing SURE or GSURE
singval the singular values of the estimator
low.rank the results of the SVD of the estimator
See Also
Examples
don.NA <- LRsim(200, 500, 100, 4)$X
don.NA[sample(1:(200*500),20, replace = FALSE)] <- NA
## Not run: adaNA <- imputeada(don.NA, lambda = 0.022, gamma = 2.3)
esti <- adaNA$mu.hat
comp <- adaNA$completeObs
## End(Not run)