R: Boosting Method for Variable Selection and Estimation

Boost_VSE {SIMEXBoost}

R Documentation

Boosting Method for Variable Selection and Estimation

Description

The function Boost_VSE, named after the Boosting procedure for Variable Selection and Estimation, is used to deal with regression models and data structures that are considered in ME_Data.

Usage

Boost_VSE(Y,Xstar,type="normal",Iter=200,Lambda=0)

Arguments

`Y`	Responses in the dataset. If `type` is specified as "normal", "binary", or "poisson", then `Y` should be a n-dimensional vector; if `type` is given by "AFT-normal" or "AFT-loggamma", then `Y` should be a (n,2) matrix of interval-censored responses, where the first column is the lower bound of an interval-censored response and the second column is the upper bound of an interval-censored response.
`Xstar`	An (n,p) matrix of covariates. They can be error-prone or precisely measured.
`type`	`type` reflects the specification of regression models. "normal" means the linear regression model with the error term generated by the standard normal distribution; "binary" means the logistic regression model; "poisson" means the Poisson regression model. In addition, the accelerated failure time (AFT) model is also considered to fit length-biased and interval-censored survival data. Specifically, "AFT-normal" represents the AFT model with the error term being normal distributions; "AFT-loggamma" represents the AFT model with the error term specified as log-gamma distributions.
`Iter`	The number of iterations for the boosting procedure. The default value is 100.
`Lambda`	A tuning parameter that aims to deal with the collinearity of covariates. "Lambda=0" means that no L2-norm is involved, and it is taken as a default value.

Details

This function aims to address variable selection and estimation for (ultra)high-dimensional data. This function can handle generalized linear models (in particular, linear regression models, logistic regression models, and Poisson regression models) and accelerated failure time models in survival analysis. When the input Xstar is precisely measured covariates, the resulting BetaHat is the vector of estimators; if the input Xstar is error-prone covariates, the resulting BetaHat is called "naive" estimator.

Value

BetaHat

the estimator obtained by the boosting method.

Author(s)

Bangxu Qiu and Li-Pang Chen

References

Chen, L.-P. (2023). De-noising boosting methods for variable selection and estimation subject to error-prone variables. Statistics and Computing, 33:38.

Chen, L.-P. and Qiu, B. (2023). Analysis of length-biased and partly interval-censored survival data with mismeasured covariates. Biometrics. To appear. <doi: 10.1111/biom.13898>

Hastie, T., Tibshirani, R. and Friedman, J. (2008). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York.

Examples


##### Example 1: A linear model with precisely measured covariates #####

X1 = matrix(rnorm((20)*400),nrow=400,ncol=20,byrow=TRUE)

data=ME_Data(X=X1,beta=c(1,1,1,rep(0,dim(X1)[2]-3)),
type="normal",sigmae=diag(0,dim(X1)[2]))

Y<-data$response
Xstar<-data$ME_covariate

Boost_VSE(Y,Xstar,type="normal",Iter=3)


##### Example 2: A linear model with error-prone covariates #####

X1 = matrix(rnorm((20)*400),nrow=400,ncol=20,byrow=TRUE)

data=ME_Data(X=X1,beta=c(1,1,1,rep(0,dim(X1)[2]-3)),
type="normal",sigmae=diag(0.3,dim(X1)[2]))

Y<-data$response
Xstar<-data$ME_covariate

Boost_VSE(Y,Xstar,type="normal",Iter=3)

[Package SIMEXBoost version 0.2.0 Index]