R: Boosting Method for High-Dimensional Error-Prone Data

ME_Data {SIMEXBoost}

R Documentation

Boosting Method for High-Dimensional Error-Prone Data

Description

This function aims to generate artificial data with error-prone covariates.

Usage

ME_Data(X,beta,type="normal",sigmae,pr0=0.5)

Arguments

`X`	An (n,p) matrix of the "unobserved" covariates provided by users.
`beta`	An p-dimensional vector of parameters provided by users.
`type`	A regression model that is specified to generate the response. "normal" means the linear regression model with the error term generated by the standard normal distribution; "binary" means the logistic regression model; "poisson" means the Poisson regression model. In addition, the accelerated failure time (AFT) model is considered to fit length-biased and interval-censored survival data. Specifically, "AFT-normal" generates the length-biased and interval-censored survival data under the AFT model with the error term being normal distributions; "AFT-loggamma" generates the length-biased and interval-censored survival data under the AFT model with the error term being log-gamma distributions.
`sigmae`	An (p,p) covariance matrix of the noise term in the classical measurement error model. Given `sigmae` with non-zero entries, one can generate the error-prone covariates. Moreover, if `sigmae` is given by the zero matrix, then the resulting covariate is the original input given by users.
`pr0`	A numerical value in an interval (0,1). It is used to determine the censoring rate for the length-biased and interval-censored data. The default value is 0.5.

Details

This function aims to generate artificial data with error-prone covariates. Given generalized linear models (GLM), we generate an n-dimensional vector of responses. Linear regression models, logistic regression models, and Poisson regression models are particularly considered. In survival analysis, accelerated failure time (AFT) models are perhaps commonly used formulations. We use AFT models to generate length-biased and interval-censored responses. In addition to responses generated by specific regression models, we also employ the classical measurement error model to generate the mismeasured covariates.

Value

response

Responses generated by a specific regression model. type="normal" gives a n-dimensional continuous vector; type="binary" gives a n-dimensional vector with binary entries; type="poisson" gives a n-dimensional vector with entries being counting numbers. In addition, type="AFT-normal" and type="AFT-loggamma" generates a (n,2) matrix of length-biased and interval-censored responses, where the first column is the lower bound of an interval-censored response and the second column is the upper bound of an interval-censored response.

ME_covariate

an (n,p) matrix of error-prone covariates.

Author(s)

Bangxu Qiu and Li-Pang Chen

Examples


##### Example 1: A linear model with precisely measured covariates ##########
X<-matrix(rnorm((20)*400),nrow=400,ncol=20,byrow=TRUE)
data=ME_Data(X=X,beta=c(1,1,1,rep(0,dim(X)[2]-3)),type="normal",diag(0,dim(X)[2]))
Y<-data$response
Xstar<-data$ME_covariate



##### Example 2: A linear model with error-prone covariates ##########
X<-matrix(rnorm((20)*400),nrow=400,ncol=20,byrow=TRUE)
data=ME_Data(X=X,beta=c(1,1,1,rep(0,dim(X)[2]-3)),type="normal",diag(0.3,dim(X)[2]))
Y<-data$response
Xstar<-data$ME_covariate

[Package SIMEXBoost version 0.2.0 Index]