gen.data {BeSS} | R Documentation |

Generate data for simulations under the generalized linear model and Cox model.

gen.data(n, p, family, K, rho = 0, sigma = 1, beta = NULL, censoring = TRUE, c = 1, scal)

`n` |
The number of observations. |

`p` |
The number of predictors of interest. |

`family` |
The distribution of the simulated data. " |

`K` |
The number of nonzero coefficients in the underlying regression model. |

`rho` |
A parameter used to characterize the pairwise correlation in predictors. Default is 0. |

`sigma` |
A parameter used to control the signal-to-noise ratio. For linear regression, it is the error variance |

`beta` |
The coefficient values in the underlying regression model. |

`censoring` |
Whether data is censored or not. Default is TRUE |

`c` |
The censoring rate. Default is 1. |

`scal` |
A parameter in generating survival time based on the Weibull distribution. Only used for the " |

For the design matrix *X*, we first generate an n x p random Gaussian matrix *\bar{X}* whose entries are i.i.d. *\sim N(0,1)* and then normalize its columns to the *√ n* length. Then the design matrix *X* is generated with *X_j = \bar{X}_j + ρ(\bar{X}_{j+1}+\bar{X}_{j-1})* for *j=2,…,p-1*.

For "`gaussian`

" family, the data model is

*Y = X β + ε, where ε \sim N(0, σ^2 ).*

The underlying regression coefficient *β* has uniform distribution [m, 100m], *m=5 √{2log(p)/n}.*

For "`binomial`

" family, the data model is

*Prob(Y = 1) = exp(X β)/(1 + exp(X β))*

The underlying regression coefficient *β* has uniform distribution [2m, 10m], *m = 5σ √{2log(p)/n}.*

For "`cox`

" family, the data model is

*T = (-log(S(t))/exp(X β))^(1/scal),*

The centerning time `C`

is generated from uniform distribution [0, c], then we define the censor status as *δ = I{T <= C}, R = min{T, C}*.
The underlying regression coefficient *β* has uniform distribution [2m, 10m], *m = 5σ √{2log(p)/n}.*

A list with the following components: x, y, Tbeta.

`x` |
Design matrix of predictors. |

`y` |
Response variable |

`Tbeta` |
The coefficients used in the underlying regression model. |

Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.

Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, *Journal of Statistical Software*, Vol. 94(4). doi:10.18637/jss.v094.i04.

# Generate simulated data n <- 500 p <- 20 K <-10 sigma <- 1 rho <- 0.2 data <- gen.data(n, p, family = "gaussian", K, rho, sigma) # Best subset selection fit <- bess(data$x, data$y, family = "gaussian")

[Package *BeSS* version 2.0.3 Index]