R: Generate toy data.

gen_toy_data {sgs}

R Documentation

Generate toy data.

Description

Generates different types of datasets, which can then be fitted using sparse-group SLOPE.

Usage

gen_toy_data(
  p,
  n,
  rho = 0,
  seed_id = 2,
  grouped = TRUE,
  groups,
  noise_level = 1,
  group_sparsity = 0.1,
  var_sparsity = 0.5,
  orthogonal = FALSE,
  data_mean = 0,
  data_sd = 1,
  signal_mean = 0,
  signal_sd = sqrt(10)
)

Arguments

`p`	The number of input variables.
`n`	The number of observations.
`rho`	Correlation coefficient. Must be in range `[0,1]`.
`seed_id`	Seed to be used to generate the data matrix `X`.
`grouped`	A logical flag indicating whether grouped data is required.
`groups`	If `grouped=TRUE`, the grouping structure is required. Each input variable should have a group id.
`noise_level`	Defines the level of noise (`sigma`) to be used in generating the response vector `y`.
`group_sparsity`	Defines the level of group sparsity. Must be in the range `[0,1]`.
`var_sparsity`	Defines the level of variable sparsity. Must be in the range `[0,1]`. If `grouped=TRUE`, this defines the level of sparsity within each group, not globally.
`orthogonal`	Logical flag as to whether the input matrix should be orthogonal.
`data_mean`	Defines the mean of input predictors.
`data_sd`	Defines the standard deviation of the signal (`beta`).
`signal_mean`	Defines the mean of the signal (`beta`).
`signal_sd`	Defines the standard deviation of the signal (`beta`).

Details

The data is generated under a Gaussian linear model. The generated data can be grouped and sparsity can be provided at both a group and/or variable level.

Value

A list containing:

`y`	The response vector.
`X`	The input matrix.
`true_beta`	The true values of `beta` used to generate the response.
`true_grp_id`	Indices of which groups are non-zero in `true_beta`.

Examples

# specify a grouping structure
groups = c(rep(1:20, each=3),
          rep(21:40, each=4),
          rep(41:60, each=5),
          rep(61:80, each=6),
          rep(81:100, each=7))
# generate data
data =  gen_toy_data(p=500, n=400, groups = groups, seed_id=3)

[Package sgs version 0.2.0 Index]