R: Accommodating missingness in environmental measurements in...

BLMCP {GEInter}

R Documentation

Accommodating missingness in environmental measurements in gene-environment interaction analysis: penalized estimation and selection

Description

The joint gene-environment (G-E) interaction analysis approach developed in Liu et al, 2013. To accommodate "main effects, interactions" hierarchy, two types of penalty, group minimax concave penalty (MCP) and MCP are adopted. Specifically, for each G factor, its main effect and corresponding G-E interactions are regarded as a group, where the group MCP is imposed to identify whether this G factor has any effect at all. In addition, the MCP is imposed on the interaction terms to further identify important interactions.

Usage

BLMCP(
  G,
  E,
  Y,
  weight = NULL,
  lambda1,
  lambda2,
  gamma1 = 6,
  gamma2 = 6,
  max_iter = 200
)

Arguments

`G`	Input matrix of `p` G measurements consisting of `n` rows. Each row is an observation vector.
`E`	Input matrix of `q` environmental risk factors. Each row is an observation vector.
`Y`	Response variable. A quantitative vector for continuous response. For survival response, `Y` should be a two-column matrix with the first column being the log(survival time) and the second column being the censoring indicator. The indicator is a binary variable, with "1" indicating dead, and "0" indicating right censored.
`weight`	Observation weights.
`lambda1`	A user supplied lambda for group MCP, where each main G effect and its corresponding interactions are regarded as a group.
`lambda2`	A user supplied lambda for MCP accommodating interaction selection.
`gamma1`	The regularization parameter of the group MCP penalty.
`gamma2`	The regularization parameter of the MCP penalty.
`max_iter`	Maximum number of iterations.

Value

An object with S3 class "BLMCP" is returned, which is a list with the following components.

`call`	The call that produced this object.
`alpha`	The matrix of the coefficients for main E effects.
`beta`	The matrix of the regression coefficients for all main G effects (the first row) and interactions.
`df`	The number of nonzeros.
`BIC`	Bayesian Information Criterion.
`aa`	The indicator representing whether the algorithm reaches convergence.

References

Mengyun Wu, Yangguang Zang, Sanguo Zhang, Jian Huang, and Shuangge Ma. Accommodating missingness in environmental measurements in gene-environment interaction analysis. Genetic Epidemiology, 41(6):523-554, 2017.
Jin Liu, Jian Huang, Yawei Zhang, Qing Lan, Nathaniel Rothman, Tongzhang Zheng, and Shuangge Ma. Identification of gene-environment interactions in cancer studies using penalization. Genomics, 102(4):189-194, 2013.

Examples

set.seed(100)
sigmaG=AR(0.3,100)
G=MASS::mvrnorm(250,rep(0,100),sigmaG)
E=matrix(rnorm(250*5),250,5)
E[,2]=E[,2]>0;E[,3]=E[,3]>0
alpha=runif(5,2,3)
beta=matrix(0,5+1,100);beta[1,1:8]=runif(8,2,3)
beta[2:4,1]=runif(3,2,3);beta[2:3,2]=runif(2,2,3);beta[5,3]=runif(1,2,3)

# continuous with Normal error
y1=simulated_data(G,E,alpha,beta,error=rnorm(250),family="continuous")
fit1<-BLMCP(G,E,y1,weight=NULL,lambda1=0.05,lambda2=0.06,gamma1=3,gamma2=3,max_iter=200)
coef1=coef(fit1)
y1_hat=predict(fit1,E,G)
plot(fit1)

# survival with Normal error
y2=simulated_data(G,E,alpha,beta,rnorm(250,0,1),family="survival",0.7,0.9)
fit2<-BLMCP(G,E,y2,weight=NULL,lambda1=0.05,lambda2=0.06,gamma1=3,gamma2=3,max_iter=200)
coef2=coef(fit2)
y2_hat=predict(fit2,E,G)
plot(fit2)

[Package GEInter version 0.3.2 Index]