datagenerator {metafuse}R Documentation

simulate data

Description

Simulate a dataset with data from K different sources, for demonstration of metafuse.

Usage

datagenerator(n, beta0, family, seed = NA)

Arguments

n

a vector of length K (the total number of datasets being integrated), specifying the sample sizes of individual datasets; can also be an scalar, in which case the function simulates K datasets of equal sample size

beta0

a coefficient matrix of dimension K * p, where K is the number of datasets being integrated and p is the number of covariates, including the intercept

family

the type of the response vector, c("gaussian", "binomial", "poisson", "cox"); "gaussian" for continuous response, "binomial" for binary response, "poisson" for count response, "cox" for observed time-to-event response, with censoring indicator

seed

the random seed for data generation, default is NA

Details

These datasets are artifical, and are used to demonstrate the features of metafuse. In the case when family="cox", the response will contain two vectors, a time-to-event variable time and a censoring indicator status.

Value

Returns data frame with n*K rows (if n is a scalar), or sum(n) rows (if n is a K-element vector). The data frame contains columns "y", "x1", ..., "x_p-1" and "group" if family="gaussian", "binomial" or "poisson"; or contains columns "time", "status", "x1", ..., "x_p-1" and "group" if family="cox".

Examples

########### generate data ###########
n <- 200    # sample size in each dataset (can also be a K-element vector)
K <- 10     # number of datasets for data integration
p <- 3      # number of covariates in X (including the intercept)

# the coefficient matrix of dimension K * p, used to specify the heterogeneous pattern
beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,   # beta_0 of intercept
                  0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,   # beta_1 of X_1
                  0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0),  # beta_2 of X_2
                K, p)

# generate a data set, family=c("gaussian", "binomial", "poisson", "cox")
data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123)
names(data)

# if family="cox", returned dataset contains columns "time"" and "status" instead of "y"
data <- datagenerator(n=n, beta0=beta0, family="cox", seed=123)
names(data)

[Package metafuse version 2.0-1 Index]