gen.data {BeSS} | R Documentation |

## Generate simulated data

### Description

Generate data for simulations under the generalized linear model and Cox model.

### Usage

```
gen.data(n, p, family, K, rho = 0, sigma = 1, beta = NULL, censoring = TRUE,
c = 1, scal)
```

### Arguments

`n` |
The number of observations. |

`p` |
The number of predictors of interest. |

`family` |
The distribution of the simulated data. " |

`K` |
The number of nonzero coefficients in the underlying regression model. |

`rho` |
A parameter used to characterize the pairwise correlation in predictors. Default is 0. |

`sigma` |
A parameter used to control the signal-to-noise ratio. For linear regression, it is the error variance |

`beta` |
The coefficient values in the underlying regression model. |

`censoring` |
Whether data is censored or not. Default is TRUE |

`c` |
The censoring rate. Default is 1. |

`scal` |
A parameter in generating survival time based on the Weibull distribution. Only used for the " |

### Details

For the design matrix `X`

, we first generate an n x p random Gaussian matrix `\bar{X}`

whose entries are i.i.d. `\sim N(0,1)`

and then normalize its columns to the `\sqrt n`

length. Then the design matrix `X`

is generated with `X_j = \bar{X}_j + \rho(\bar{X}_{j+1}+\bar{X}_{j-1})`

for `j=2,\dots,p-1`

.

For "`gaussian`

" family, the data model is

`Y = X \beta + \epsilon, where \epsilon \sim N(0, \sigma^2 ).`

The underlying regression coefficient `\beta`

has uniform distribution [m, 100m], `m=5 \sqrt{2log(p)/n}.`

For "`binomial`

" family, the data model is

`Prob(Y = 1) = exp(X \beta)/(1 + exp(X \beta))`

The underlying regression coefficient `\beta`

has uniform distribution [2m, 10m], `m = 5\sigma \sqrt{2log(p)/n}.`

For "`cox`

" family, the data model is

`T = (-log(S(t))/exp(X \beta))^(1/scal),`

The centerning time `C`

is generated from uniform distribution [0, c], then we define the censor status as `\delta = I{T <= C}, R = min{T, C}`

.
The underlying regression coefficient `\beta`

has uniform distribution [2m, 10m], `m = 5\sigma \sqrt{2log(p)/n}.`

### Value

A list with the following components: x, y, Tbeta.

`x` |
Design matrix of predictors. |

`y` |
Response variable |

`Tbeta` |
The coefficients used in the underlying regression model. |

### Author(s)

Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.

### References

Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, *Journal of Statistical Software*, Vol. 94(4). doi:10.18637/jss.v094.i04.

### Examples

```
# Generate simulated data
n <- 500
p <- 20
K <-10
sigma <- 1
rho <- 0.2
data <- gen.data(n, p, family = "gaussian", K, rho, sigma)
# Best subset selection
fit <- bess(data$x, data$y, family = "gaussian")
```

*BeSS*version 2.0.4 Index]