R: Generation of Artificial Data

data_gen {AFFECT}

R Documentation

Generation of Artificial Data

Description

The function generates a set of artificial data, including covariates generated by uniform distribution with an interval [0.5, 0.5], survival time and censoring status with measurement error and misclassifications. In this function, users can specify different degrees of measurement error that links observed survival time with true survival time, and links observed censoring status with true censoring status. Moreover, the accelerated functional failure time model considered in function is given by T=f(X1)+f(X2)+f(X3)+f(X4)+error, where T is log failure time and f(X1)=4*x1^2+x1, f(X2)=sin(6*x2),f(X3)=cos(6*x3)-1 and f(X4)=4*x4^3+x4^2.

Usage

data_gen(n, p, pi_01, pi_10, gamma0, gamma1, e_var)

Arguments

`n`	Sample size.
`p`	The number of covariates.
`pi_01`	Misclassifcation probability is P(Observed Censoring Status = 0 \| Actual Censoring Status = 1).
`pi_10`	Misclassifcation probability is P(Observed Censoring Status = 1 \| Actual Censoring Status = 0).
`gamma0`	A scalar that links the observed survival time and true survival time in the classical additive measurement error model `y=y+gamma0+gamma1X+v`, where y* is observed survival time and `y` is true survival time, and `x` is covariates and v is noise term.
`gamma1`	A `p`-dimensional vector of parameters in the additive measurement error model `y=y+gamma0+gamma1X+v`, where `y*` is observed survival time and `y` is true survival time, `x` is covariates and `v` is noise term.
`e_var`	The variance of noise term `v` in the additive measurement error model `y=y+gamma0+gamma1X+v`, where `v` is assumed to follow a normal distribution.

Value

generated_data c(n,p+2) dimensional data frame. The first column is observed survival time and second column is observed censoring status, and the other columns are covariates.

Examples

## Set the relationship between observed survival time
## and true survival time equals y*= y+1+X1+v, where the variance is
## 0.75 with n=500 and p=50 and misclassification probability=0.9.

a <- matrix(0,ncol=50, nrow = 1); a[1,1] <- 1
data <- data_gen(n=500, p=50, pi_01=0.9, pi_10 = 0.9, gamma0=1,
gamma1=a, e_var=0.75)

[Package AFFECT version 0.1.2 Index]