RobSBoosting {GEInter}R Documentation

Robust semiparametric gene-environment interaction analysis using sparse boosting

Description

Robust semiparametric gene-environment interaction analysis using sparse boosting. Here a semiparametric model is assumed to accommodate nonlinear effects, where we model continuous environmental (E) factors in a nonlinear way, and discrete E factors and all genetic (G) factors in a linear way. For estimating the nonlinear functions, the B spline expansion is adopted. The Huber loss function and Qn estimator are adopted to accommodate long-tailed distribution/data contamination. For model estimation and selection of relevant variables, we adopt an effective sparse boosting approach, where the strong hierarchy is respected.

Usage

RobSBoosting(
  G,
  E,
  Y,
  loop_time,
  num.knots = NULL,
  Boundary.knots = NULL,
  degree = 1,
  v = 0.1,
  family = c("continuous", "survival"),
  knots = NULL,
  E_type
)

Arguments

G

Input matrix of p genetic measurements consisting of n rows. Each row is an observation vector.

E

Input matrix of q environmental risk factors, each row is an observation vector.

Y

Response variable. A quantitative vector for family="continuous". For family="survival", Y should be a two-column matrix with the first column being the log(survival time) and the second column being the censoring indicator. The indicator is a binary variable, with "1" indicating dead, and "0" indicating right censored.

loop_time

Number of iterations of the sparse boosting.

num.knots

Numbers of knots for the B spline basis.

Boundary.knots

The boundary of knots for the B spline basis.

degree

Degree for the B spline basis.

v

The step size used in the sparse boosting process. Default is 0.1.

family

Response type of Y (see above).

knots

List of knots for the B spline basis. Default is NULL and knots can be generated with the given num.knots, degree and Boundary.knots.

E_type

A vector indicating the type of each E factor, with "ED" representing discrete E factor, and "EC" representing continuous E factor.

Value

An object with S3 class "RobSBoosting" is returned, which is a list with the following components.

call

The call that produced this object.

max_t

The stopping iteration time of the sparse boosting.

spline_result

A list of length max_t that includes the estimation results of each iteration.

BIC

A vector of length max_t that includes Bayesian Information Criterion based on the Huber's prediction error.

variable

A vector of length max_t that includes the index of selected variable in each iteration.

id

The iteration time with the smallest BIC.

variable_pair

A matrix with two columns that include the set of variables that can potentially enter the regression model at the stopping iteration time. Here, the first and second columns correspond to the indexes of E factors and G factors. For example, (1, 0) represents that this variable is the first E factor, and (1,2) represents that the variable is the interaction between the first E factor and second G factor.

v_type

A vector whose length is the number of rows of variable_pair, with each element representing the variable type of the corresponding row of variable_pair. Here, "EC" stands for continuous E effect, "ED" for discrete E effect, and "G" for G effect, "EC-G" for the interaction between "EC" and "G", "ED-G" for the interaction between "ED" and "G".

family

The same as input family.

degree

Degree for the B spline basis.

v

The step size used in the sparse boosting process.

NorM

The values of B spline basis.

estimation_results

A list of estimation results for each variable. Here, the first q elemnets are for the E effects, the (q+1) element is for the first G effect and the (q+2) to (2q+1) elements are for the interactions corresponding to the first G factor, and so on.

References

Mengyun Wu and Shuangge Ma. Robust semiparametric gene-environment interaction analysis using sparse boosting. Statistics in Medicine, 38(23):4625-4641, 2019.

See Also

bs method for B spline expansion, coef, predict, and plot methods, and Miss.boosting method.

Examples

data(Rob_data)
G=Rob_data[,1:20];E=Rob_data[,21:24]
Y=Rob_data[,25];Y_s=Rob_data[,26:27]
knots = list();Boundary.knots = matrix(0, 24, 2)
for(i in 1:4) {
  knots[[i]] = c(0, 1)
  Boundary.knots[i, ] = c(0, 1)
  }

#continuous
fit1= RobSBoosting(G,E,Y,loop_time = 80,num.knots = 2,Boundary.knots=Boundary.knots,
degree = 2,family = "continuous",knots = knots,E_type=c("EC","EC","ED","ED"))
coef1 = coef(fit1)
predict1=predict(fit1,newE=E[1:2,],newG=G[1:2,])
plot(fit1)


#survival
fit2= RobSBoosting(G,E,Y_s,loop_time = 200, num.knots = 2, Boundary.knots=Boundary.knots,
family = "survival", knots = knots,E_type=c("EC","EC","ED","ED"))
coef2 = coef(fit2)
predict2=predict(fit2,newE=E[1:2,],newG=G[1:2,])
plot(fit2)


[Package GEInter version 0.3.2 Index]