R: Simulate a transformed linear model

simulate_tlm {SeBR}

R Documentation

Simulate a transformed linear model

Description

Generate training data (X, y) and testing data (X_test, y_test) for a transformed linear model. The covariates are correlated Gaussian variables. Half of the true regression coefficients are zero and the other half are one. There are multiple options for the transformation, which define the support of the data (see below).

Usage

simulate_tlm(
  n,
  p,
  g_type = "beta",
  n_test = 1000,
  heterosked = FALSE,
  lambda = 1
)

Arguments

`n`	number of observations in the training data
`p`	number of covariates
`g_type`	type of transformation; must be one of `beta`, `step`, or `box-cox`
`n_test`	number of observations in the testing data
`heterosked`	logical; if TRUE, simulate the latent data with heteroskedasticity
`lambda`	Box-Cox parameter (only applies for `g_type = 'box-cox'`)

Details

The transformations vary in complexity and support for the observed data, and include the following options: beta yields marginally Beta(0.1, 0.5) data supported on [0,1]; step generates a locally-linear inverse transformation and produces positive data; and box-cox refers to the signed Box-Cox family indexed by lambda, which generates real-valued data with examples including identity, square-root, and log transformations.

Value

a list with the following elements:

y: the response variable in the training data
X: the covariates in the training data
y_test: the response variable in the testing data
X_test: the covariates in the testing data
beta_true: the true regression coefficients
g_true: the true transformation, evaluated at y

Examples

# Simulate data:
dat = simulate_tlm(n = 100, p = 5, g_type = 'beta')
names(dat) # what is returned
hist(dat$y, breaks = 25) # marginal distribution

[Package SeBR version 1.0.0 Index]