gen_toy_data {sgs}R Documentation

Generate toy data.

Description

Generates different types of datasets, which can then be fitted using sparse-group SLOPE.

Usage

gen_toy_data(
  p,
  n,
  rho = 0,
  seed_id = 2,
  grouped = TRUE,
  groups,
  noise_level = 1,
  group_sparsity = 0.1,
  var_sparsity = 0.5,
  orthogonal = FALSE,
  data_mean = 0,
  data_sd = 1,
  signal_mean = 0,
  signal_sd = sqrt(10)
)

Arguments

p

The number of input variables.

n

The number of observations.

rho

Correlation coefficient. Must be in range [0,1].

seed_id

Seed to be used to generate the data matrix X.

grouped

A logical flag indicating whether grouped data is required.

groups

If grouped=TRUE, the grouping structure is required. Each input variable should have a group id.

noise_level

Defines the level of noise (sigma) to be used in generating the response vector y.

group_sparsity

Defines the level of group sparsity. Must be in the range [0,1].

var_sparsity

Defines the level of variable sparsity. Must be in the range [0,1]. If grouped=TRUE, this defines the level of sparsity within each group, not globally.

orthogonal

Logical flag as to whether the input matrix should be orthogonal.

data_mean

Defines the mean of input predictors.

data_sd

Defines the standard deviation of the signal (beta).

signal_mean

Defines the mean of the signal (beta).

signal_sd

Defines the standard deviation of the signal (beta).

Details

The data is generated under a Gaussian linear model. The generated data can be grouped and sparsity can be provided at both a group and/or variable level.

Value

A list containing:

y

The response vector.

X

The input matrix.

true_beta

The true values of beta used to generate the response.

true_grp_id

Indices of which groups are non-zero in true_beta.

Examples

# specify a grouping structure
groups = c(rep(1:20, each=3),
          rep(21:40, each=4),
          rep(41:60, each=5),
          rep(61:80, each=6),
          rep(81:100, each=7))
# generate data
data =  gen_toy_data(p=500, n=400, groups = groups, seed_id=3)


[Package sgs version 0.2.0 Index]