MBI {BlockMissingData} | R Documentation |
Variable selection method with multiple block-wise imputation (MBI)
Description
Fit a variable selection method with multiple block-wise imputation (MBI).
Usage
MBI(
X,
y,
cov_index,
sub_index,
miss_source,
complete,
lambda = NULL,
eps1 = 0.001,
eps2 = 1e-07,
eps3 = 1e-08,
max.iter = 1000,
lambda.min = ifelse(n > p, 0.001, 0.05),
nlam = 100,
beta0 = NULL,
a = 3.7,
gamma.ebic = 0.5,
alpha1 = 0.5^(0:12),
h1 = 2^(-(8:30)),
ratio = 1
)
Arguments
X |
Design matrix for block-wise missing covariates. |
y |
Response vector. |
cov_index |
Starting indexes of covariates in data sources. |
sub_index |
Starting indexes of subjects in missing groups. |
miss_source |
Indexes of missing data sources in missing groups, respectively ('NULL' represents no missing). |
complete |
Logical indicator of whether there is a group of complete cases. If there is a group of complete cases, it should be the first group. 'TRUE' represents that there is a group of complete cases. |
lambda |
A user supplied sequence of tuning parameter in penalty. If NULL, a sequence is automatically generated. |
eps1 |
Convergence threshold at certain stage of the algorithm. Default is 1e-3. |
eps2 |
Convergence threshold at certain stage of the algorithm. Default is 1e-7. |
eps3 |
Convergence threshold at certain stage of the algorithm. Default is 1e-8. |
max.iter |
The maximum number of iterations allowed. Default is 1000. |
lambda.min |
Smallest value for |
nlam |
The number of |
beta0 |
Initial value for regression coefficients. If NULL, they are initialized automatically. |
a |
Tuning parameter in the SCAD penalty. Default is 3.7. |
gamma.ebic |
Parameter in the EBIC criterion. Default is 0.5. |
alpha1 |
A sequence of candidate values for the step size in the conjugate gradient algorithm. Default is 0.5^(0:12). |
h1 |
A sequence of candidate values for the parameter in the numerical calculation of the first derivative of the objective function. Default is 2^(-(8:30)). |
ratio |
Parameter in the numerical calculation of the first derivative. Default is 1. |
Details
The function uses the penalized generalized method of moments with multiple block-wise imputation to handle block-wise missing data, commonly found in multi-source datasets.
Value
beta |
Estimated coefficients matrix with |
lambda |
The actual sequence of |
bic1 |
BIC criterion values. '0' should be ignored. |
notcon |
Value indicating whether the algorithm is converged or not. '0' represents convergence; otherwise non-convergence. |
intercept |
Intercept sequence of length |
beta0 |
Estimated coefficients matrix for standardized |
Author(s)
Fei Xue and Annie Qu
References
Xue, F., and Qu, A. (2021) Integrating Multisource Block-Wise Missing Data in Model Selection (2021), Journal of the American Statistical Association, Vol. 116(536), 1914-1927.
Examples
library(MASS)
# Number of subjects
n <- 30
# Number of total covariates
p <- 4
# Number of missing groups of subjects
ngroup <- 2
# Number of data sources
nsource <- 2
# Starting indexes of covariates in data sources
cov_index=c(1, 3)
# Starting indexes of subjects in missing groups
sub_index=c(1, 16)
# Indexes of missing data sources in missing groups, respectively ('NULL' represents no missing)
miss_source=list(NULL, 1)
# Indicator of whether there is a group of complete cases. If there is a group of complete cases,
# it should be the first group.
complete=TRUE
# Create a block-wise missing design matrix X and response vector y
set.seed(1)
sigma=diag(1-0.4,p,p)+matrix(0.4,p,p)
X <- mvrnorm(n,rep(0,p),sigma)
beta_true <- c(2.5, 0, 3, 0)
y <- rnorm(n) + X%*%beta_true
for (i in 1:ngroup) {
if (!is.null(miss_source[[i]])) {
if (i==ngroup) {
if (miss_source[[i]]==nsource) {
X[sub_index[i]:n, cov_index[miss_source[[i]]]:p] = NA
} else {
X[sub_index[i]:n, cov_index[miss_source[[i]]]:(cov_index[miss_source[[i]]+1]-1)] = NA
}
} else {
if (miss_source[[i]]==nsource) {
X[sub_index[i]:(sub_index[i+1]-1), cov_index[miss_source[[i]]]:p] = NA
} else {
X[sub_index[i]:(sub_index[i+1]-1), cov_index[miss_source[[i]]]:
(cov_index[miss_source[[i]]+1]-1)] = NA
}
}
}
}
# Now we can use the function with this simulated data
#start.time = proc.time()
result <- MBI(X=X, y=y, cov_index=cov_index, sub_index=sub_index, miss_source=miss_source,
complete=complete, nlam = 15, eps2 = 1e-3, h1=2^(-(8:20)))
#time = proc.time() - start.time
theta=result$beta
bic1=result$bic1
best=which.min(bic1[bic1!=0])
beta_est=theta[best,]