SimulateRegression {fake} | R Documentation |
Data simulation for multivariate regression
Description
Simulates data with outcome(s) and predictors, where only a subset of the predictors actually contributes to the definition of the outcome(s).
Usage
SimulateRegression(
n = 100,
pk = 10,
xdata = NULL,
family = "gaussian",
q = 1,
theta = NULL,
nu_xy = 0.2,
beta_abs = c(0.1, 1),
beta_sign = c(-1, 1),
continuous = TRUE,
ev_xy = 0.7
)
Arguments
n |
number of observations in the simulated dataset. Not used if
|
pk |
number of predictor variables. A subset of these variables
contribute to the outcome definition (see argument |
xdata |
optional data matrix for the predictors with variables as
columns and observations as rows. A subset of these variables contribute to
the outcome definition (see argument |
family |
type of regression model. Possible values include
|
q |
number of outcome variables. |
theta |
binary matrix with as many rows as predictors and as many
columns as outcomes. A nonzero entry on row |
nu_xy |
vector of length |
beta_abs |
vector defining the range of nonzero regression coefficients
in absolute values. If |
beta_sign |
vector of possible signs for regression coefficients.
Possible inputs are: |
continuous |
logical indicating whether to sample regression
coefficients from a uniform distribution between the minimum and maximum
values in |
ev_xy |
vector of length |
Value
A list with:
xdata |
input or simulated predictor data. |
ydata |
simulated outcome data. |
beta |
matrix of true beta
coefficients used to generate outcomes in |
theta |
binary matrix indicating the predictors from
|
References
Bodinier B, Filippi S, Nost TH, Chiquet J, Chadeau-Hyam M (2021). “Automated calibration for stability selection in penalised regression and graphical models: a multi-OMICs network application exploring the molecular response to tobacco smoking.” https://arxiv.org/abs/2106.02521.
See Also
Other simulation functions:
SimulateAdjacency()
,
SimulateClustering()
,
SimulateComponents()
,
SimulateCorrelation()
,
SimulateGraphical()
,
SimulateStructural()
Examples
## Independent predictors
# Univariate continuous outcome
set.seed(1)
simul <- SimulateRegression(pk = 15)
summary(simul)
# Univariate binary outcome
set.seed(1)
simul <- SimulateRegression(pk = 15, family = "binomial")
table(simul$ydata)
# Multiple continuous outcomes
set.seed(1)
simul <- SimulateRegression(pk = 15, q = 3)
summary(simul)
## Blocks of correlated predictors
# Simulation of predictor data
set.seed(1)
xsimul <- SimulateGraphical(pk = rep(5, 3), nu_within = 0.8, nu_between = 0, v_sign = -1)
Heatmap(cor(xsimul$data),
legend_range = c(-1, 1),
col = c("navy", "white", "darkred")
)
# Simulation of outcome data
simul <- SimulateRegression(xdata = xsimul$data)
print(simul)
summary(simul)
## Choosing expected proportion of explained variance
# Data simulation
set.seed(1)
simul <- SimulateRegression(n = 1000, pk = 15, q = 3, ev_xy = c(0.9, 0.5, 0.2))
summary(simul)
# Comparing with estimated proportion of explained variance
summary(lm(simul$ydata[, 1] ~ simul$xdata))
summary(lm(simul$ydata[, 2] ~ simul$xdata))
summary(lm(simul$ydata[, 3] ~ simul$xdata))
## Choosing expected concordance (AUC)
# Data simulation
set.seed(1)
simul <- SimulateRegression(
n = 500, pk = 10,
family = "binomial", ev_xy = 0.9
)
# Comparing with estimated concordance
fitted <- glm(simul$ydata ~ simul$xdata,
family = "binomial"
)$fitted.values
Concordance(observed = simul$ydata, predicted = fitted)