R: Generate Count Data

generateCountData {NBLDA}

R Documentation

Generate Count Data

Description

This function can be used to generate counts, e.g., RNA-Sequencing data, for both the classification and clustering purposes.

Usage

generateCountData(n, p, K, param, sdsignal = 1, DE = 0.3, allZero.rm = TRUE,
  tag.samples = FALSE)

Arguments

`n`	number of samples.
`p`	number of variables/features.
`K`	number of classes.
`param`	overdispersion parameter. This parameter is matched with the argument `size` in the `rnbinom` function. Hence, the Negative Binomial distribution approximates to the Poisson distribution as `param` increases.
`sdsignal`	a nonzero numeric value. As `sdsignal` increases, the observed counts greatly differs among K classes.
`DE`	a numeric value within the interval [0, 1]. This is the proportion of total number of variables that is significantly different among K classes. The remaining part is assumed to be having no contribution to the discrimination function.
`allZero.rm`	a logical. If TRUE, the columns having all zero cells are dropped.
`tag.samples`	a logical. If TRUE, the row names are automatically generated using a tag for each sample such as "S1", "S2", etc.

Value

`x`, `xte`	count data matrix for training and test set.
`y`, `yte`	class labels for training and test set.
`truesf`, `truesfte`	true size factors for training and test set. See Witten (2011) for more information on estimating size factors.

Author(s)

Dincer Goksuluk

Examples

set.seed(2128)
counts <- generateCountData(n = 20, p = 10, K = 2, param = 1, sdsignal = 0.5, DE = 0.8,
                            allZero.rm = FALSE, tag.samples = TRUE)
head(counts$x)

[Package NBLDA version 1.0.1 Index]