Miss.boosting {GEInter} | R Documentation |
Robust gene-environment interaction analysis approach via sparse boosting, where the missingness in environmental measurements is effectively accommodated using multiple imputation approach
Description
This gene-environment analysis approach includes three steps to accommodate both missingness
in environmental (E) measurements and long-tailed or contaminated outcomes. At the first step,
the multiple imputation approach based on sparse boosting method is developed to accommodate
missingness in E measurements, where we use NA
to represent those E measurments which
are missing. Here a semiparametric model is assumed to accommodate nonlinear effects, where we
model continuous E factors in a nonlinear way, and discrete E factors in a linear way. For
estimating the nonlinear functions, the B spline expansion is adopted. At the second step, for
each imputed data, we develop RobSBoosting
approach for identifying important main E
and genetic (G) effects, and G-E interactions, where the Huber loss function and Qn estimator are
adopted to accommodate long-tailed distribution/data contamination (see RobSBoosting
).
At the third step, the identification results from Step 2 are combined based on stability
selection technique.
Usage
Miss.boosting(
G,
E,
Y,
im_time = 10,
loop_time = 500,
num.knots = c(2),
Boundary.knots,
degree = c(2),
v = 0.1,
tau,
family = c("continuous", "survival"),
knots = NULL,
E_type
)
Arguments
G |
Input matrix of |
E |
Input matrix of |
Y |
Response variable. A quantitative vector for |
im_time |
Number of imputation for accommodating missingness in E variables. |
loop_time |
Number of iterations of the sparse boosting. |
num.knots |
Numbers of knots for the B spline basis. |
Boundary.knots |
The boundary of knots for the B spline basis. |
degree |
Degree for the B spline basis. |
v |
The step size used in the sparse boosting process. Default is 0.1. |
tau |
Threshold used in the stability selection at the third step. |
family |
Response type of |
knots |
List of knots for the B spline basis. Default is NULL and knots can be generated
with the given |
E_type |
A vector indicating the type of each E factor, with "ED" representing discrete E factor, and "EC" representing continuous E factor. |
Value
An object with S3 class "Miss.boosting"
is returned, which is a list with the following components
call |
The call that produced this object. |
alpha0 |
A vector with each element indicating whether the corresponding E factor is selected. |
beta0 |
A vector with each element indicating whether the corresponding G factor or G-E
interaction is selected. The first element is the first G effect and the second to
( |
intercept |
The intercept estimate. |
unique_variable |
A matrix with two columns that represents the variables that are
selected for the model after removing the duplicates, since the |
unique_coef |
Coefficients corresponding to |
unique_knots |
A list of knots corresponding to |
unique_Boundary.knots |
A list of boundary knots corresponding to
|
unique_vtype |
A vector representing the variable type of |
degree |
Degree for the B spline basis. |
NorM |
The values of B spline basis. |
E_type |
The type of E effects. |
References
Mengyun Wu and Shuangge Ma. Robust semiparametric gene-environment interaction analysis using sparse boosting. Statistics in Medicine, 38(23):4625-4641, 2019.
Examples
data(Rob_data)
G=Rob_data[,1:20];E=Rob_data[,21:24]
Y=Rob_data[,25];Y_s=Rob_data[,26:27]
knots=list();Boundary.knots=matrix(0,(20+4),2)
for (i in 1:4){
knots[[i]]=c(0,1)
Boundary.knots[i,]=c(0,1)
}
E2=E1=E
##continuous
E1[7,1]=NA
fit1<-Miss.boosting(G,E1,Y,im_time=1,loop_time=100,num.knots=c(2),Boundary.knots,
degree=c(2),v=0.1,tau=0.3,family="continuous",knots=knots,E_type=c("EC","EC","ED","ED"))
y1_hat=predict(fit1,matrix(E1[1,],nrow=1),matrix(G[1,],nrow=1))
plot(fit1)
##survival
E2[4,1]=NA
fit2<-Miss.boosting(G,E2,Y_s,im_time=2,loop_time=200,num.knots=c(2),Boundary.knots,
degree=c(2),v=0.1,tau=0.3,family="survival",knots,E_type=c("EC","EC","ED","ED"))
y2_hat=predict(fit2,matrix(E1[1,],nrow=1),matrix(G[1,],nrow=1))
plot(fit2)