RobSBoosting {GEInter} | R Documentation |
Robust semiparametric gene-environment interaction analysis using sparse boosting
Description
Robust semiparametric gene-environment interaction analysis using sparse boosting. Here a semiparametric model is assumed to accommodate nonlinear effects, where we model continuous environmental (E) factors in a nonlinear way, and discrete E factors and all genetic (G) factors in a linear way. For estimating the nonlinear functions, the B spline expansion is adopted. The Huber loss function and Qn estimator are adopted to accommodate long-tailed distribution/data contamination. For model estimation and selection of relevant variables, we adopt an effective sparse boosting approach, where the strong hierarchy is respected.
Usage
RobSBoosting(
G,
E,
Y,
loop_time,
num.knots = NULL,
Boundary.knots = NULL,
degree = 1,
v = 0.1,
family = c("continuous", "survival"),
knots = NULL,
E_type
)
Arguments
G |
Input matrix of |
E |
Input matrix of |
Y |
Response variable. A quantitative vector for |
loop_time |
Number of iterations of the sparse boosting. |
num.knots |
Numbers of knots for the B spline basis. |
Boundary.knots |
The boundary of knots for the B spline basis. |
degree |
Degree for the B spline basis. |
v |
The step size used in the sparse boosting process. Default is 0.1. |
family |
Response type of |
knots |
List of knots for the B spline basis. Default is NULL and knots can be generated
with the given |
E_type |
A vector indicating the type of each E factor, with "ED" representing discrete E factor, and "EC" representing continuous E factor. |
Value
An object with S3 class "RobSBoosting"
is returned, which is a list with the following components.
call |
The call that produced this object. |
max_t |
The stopping iteration time of the sparse boosting. |
spline_result |
A list of length |
BIC |
A vector of length max_t that includes Bayesian Information Criterion based on the Huber's prediction error. |
variable |
A vector of length max_t that includes the index of selected variable in each iteration. |
id |
The iteration time with the smallest BIC. |
variable_pair |
A matrix with two columns that include the set of variables that can potentially enter the regression model at the stopping iteration time. Here, the first and second columns correspond to the indexes of E factors and G factors. For example, (1, 0) represents that this variable is the first E factor, and (1,2) represents that the variable is the interaction between the first E factor and second G factor. |
v_type |
A vector whose length is the number of rows of |
family |
The same as input |
degree |
Degree for the B spline basis. |
v |
The step size used in the sparse boosting process. |
NorM |
The values of B spline basis. |
estimation_results |
A list of estimation results for each variable. Here, the first
|
References
Mengyun Wu and Shuangge Ma. Robust semiparametric gene-environment interaction analysis using sparse boosting. Statistics in Medicine, 38(23):4625-4641, 2019.
See Also
bs
method for B spline expansion, coef
, predict
, and plot
methods, and Miss.boosting
method.
Examples
data(Rob_data)
G=Rob_data[,1:20];E=Rob_data[,21:24]
Y=Rob_data[,25];Y_s=Rob_data[,26:27]
knots = list();Boundary.knots = matrix(0, 24, 2)
for(i in 1:4) {
knots[[i]] = c(0, 1)
Boundary.knots[i, ] = c(0, 1)
}
#continuous
fit1= RobSBoosting(G,E,Y,loop_time = 80,num.knots = 2,Boundary.knots=Boundary.knots,
degree = 2,family = "continuous",knots = knots,E_type=c("EC","EC","ED","ED"))
coef1 = coef(fit1)
predict1=predict(fit1,newE=E[1:2,],newG=G[1:2,])
plot(fit1)
#survival
fit2= RobSBoosting(G,E,Y_s,loop_time = 200, num.knots = 2, Boundary.knots=Boundary.knots,
family = "survival", knots = knots,E_type=c("EC","EC","ED","ED"))
coef2 = coef(fit2)
predict2=predict(fit2,newE=E[1:2,],newG=G[1:2,])
plot(fit2)