generateBatchDataVaryingRepresentation {batchmix}R Documentation

Generate batch data

Description

Generate data from groups across batches. Assumes independence across columns. In each column the parameters are randomly permuted for both the groups and batches.

Usage

generateBatchDataVaryingRepresentation(
  N,
  P,
  group_means,
  group_std_dev,
  batch_shift,
  batch_scale,
  group_weights,
  batch_weights,
  frac_known = 0.2
)

Arguments

N

The number of items (rows) to generate.

P

The number of columns in the generated dataset.

group_means

A vector of the group means for a column.

group_std_dev

A vector of group standard deviations for a column.

batch_shift

A vector of batch means in a column.

batch_scale

A vector of batch standard deviations within a column.

group_weights

A K x B matrix of the expected proportion of N in each group in each batch.

batch_weights

A vector of the expected proportion of N in each batch.

frac_known

The expected fraction of observed labels. Used to generate a “fixed“ vector to feed into the “batchSemiSupervisedMixtureModel“ function.

Value

A list of 4 objects; the data generated from the groups with and without batch effects, the label indicating the generating group and the batch label.

Examples

N <- 500
P <- 2
K <- 2
B <- 5
mean_dist <- 4
batch_dist <- 0.3
group_means <- seq(1, K) * mean_dist
batch_shift <- rnorm(B, mean = batch_dist, sd = batch_dist)
std_dev <- rep(2, K)
batch_var <- rep(1.2, B)
group_weights <- matrix(
  c(
    0.8, 0.6, 0.4, 0.2, 0.2,
    0.2, 0.4, 0.6, 0.8, 0.8
  ),
  nrow = K, ncol = B, byrow = TRUE
)
batch_weights <- rep(1 / B, B)

my_data <- generateBatchDataVaryingRepresentation(
  N,
  P,
  group_means,
  std_dev,
  batch_shift,
  batch_var,
  group_weights,
  batch_weights
)

[Package batchmix version 2.1.0 Index]