gen.sim.data {ProSGPV}R Documentation

gen.sim.data: Generate simulation data

Description

This function can be used to generate autoregressive simulation data

Usage

gen.sim.data(
  n = 100,
  p = 50,
  s = 10,
  family = c("gaussian", "binomial", "poisson", "cox"),
  beta.min = 1,
  beta.max = 5,
  rho = 0,
  nu = 2,
  sig = 1,
  intercept = 0,
  scale = 2,
  shape = 1,
  rateC = 0.2
)

Arguments

n

Number of observations. Default is 100.

p

Number of explanatory variables. Default is 50.

s

Number of true signals. It can only be an even number. Default is 10.

family

A description of the error distribution and link function to be used in the model. It can take the value of ⁠\code{gaussian}⁠, ⁠\code{binomial}⁠, ⁠\code{poisson}⁠, and ⁠\code{cox}⁠. Default is ⁠\code{gaussian}⁠

beta.min

The smallest effect size in absolute value. Default is 1.

beta.max

The largest effect size in absolute value. Default is 5.

rho

Autocorrelation level. A numerical value between -1 and 1. Default is 0.

nu

Signal to noise ratio in linear regression. Default is 2.

sig

Standard deviation in the design matrix. Default is 1.

intercept

Intercept of the linear predictor in the GLM. Default is 0.

scale

Scale parameter in the Weibull distribution. Default is 2.

shape

Shape parameter in the Weibull distribution. Default is 1.

rateC

Rate of censoring in the survival data. Default is 0.2.

Value

A list of following components:

X

The generated explanatory variable matrix

Y

A vector of outcome. If family is ⁠\code{cox}⁠, a two-column object is returned where the first column is the time and the second column is status (0 is censoring and 1 is event)

index

The indices of true signals

beta

The true coefficient vector of length p

Examples

# generate data for linear regression
data.linear <- gen.sim.data(n = 20, p = 10, s = 4)

# extract x
x <- data.linear[[1]]

# extract y
y <- data.linear[[2]]

# extract the indices of true signals
index <- data.linear[[3]]

# extract the true coefficient vector
true.beta <- data.linear[[4]]

# generate data for logistic regression
data.logistic <- gen.sim.data(n = 20, p = 10, s = 4, family = "binomial")

# extract x
x <- data.logistic[[1]]

# extract y
y <- data.logistic[[2]]

# extract the indices of true signals
index <- data.logistic[[3]]

# extract the true coefficient vector
true.beta <- data.logistic[[4]]

[Package ProSGPV version 1.0.0 Index]