R: Simulate multi-set expression data

simulateMultiExpr {WGCNA}

R Documentation

Simulate multi-set expression data

Description

Simulation of expression data in several sets with relate module structure.

Usage

simulateMultiExpr(eigengenes, 
                  nGenes, 
                  modProportions, 
                  minCor = 0.5, maxCor = 1, 
                  corPower = 1, 
                  backgroundNoise = 0.1, 
                  leaveOut = NULL, 
                  signed = FALSE, 
                  propNegativeCor = 0.3, 
                  geneMeans = NULL,
                  nSubmoduleLayers = 0, 
                  nScatteredModuleLayers = 0, 
                  averageNGenesInSubmodule = 10, 
                  averageExprInSubmodule = 0.2, 
                  submoduleSpacing = 2, 
                  verbose = 1, indent = 0)

Arguments

`eigengenes`	the seed eigengenes for the simulated modules in a multi-set format. A list with one component per set. Each component is again a list that must contain a component `data`. This is a data frame of seed eigengenes for the corresponding data set. Columns correspond to modules, rows to samples. Number of samples in the simulated data is determined from the number of samples of the eigengenes.
`nGenes`	integer specifyin the number of simulated genes.
`modProportions`	a numeric vector with length equal the number of eigengenes in `eigengenes` plus one, containing fractions of the total number of genes to be put into each of the modules and into the "grey module", which means genes not related to any of the modules. See details.
`minCor`	minimum correlation of module genes with the corresponding eigengene. See details.
`maxCor`	maximum correlation of module genes with the corresponding eigengene. See details.
`corPower`	controls the dropoff of gene-eigengene correlation. See details.
`backgroundNoise`	amount of background noise to be added to the simulated expression data.
`leaveOut`	optional specification of modules that should be left out of the simulation, that is their genes will be simulated as unrelated ("grey"). A logical matrix in which columns correspond to sets and rows to modules. Wherever `TRUE`, the corresponding module in the corresponding data set will not be simulated, that is its genes will be simulated independently of the eigengene.
`signed`	logical: should the genes be simulated as belonging to a signed network? If `TRUE`, all genes will be simulated to have positive correlation with the eigengene. If `FALSE`, a proportion given by `propNegativeCor` will be simulated with negative correlations of the same absolute values.
`propNegativeCor`	proportion of genes to be simulated with negative gene-eigengene correlations. Only effective if `signed` is `FALSE`.
`geneMeans`	optional vector of length `nGenes` giving desired mean expression for each gene. If not given, the returned expression profiles will have mean zero.
`nSubmoduleLayers`	number of layers of ordered submodules to be added. See details.
`nScatteredModuleLayers`	number of layers of scattered submodules to be added. See details.
`averageNGenesInSubmodule`	average number of genes in a submodule. See details.
`averageExprInSubmodule`	average strength of submodule expression vectors.
`submoduleSpacing`	a number giving submodule spacing: this multiple of the submodule size will lie between the submodule and the next one.
`verbose`	integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.
`indent`	indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.

Details

For details of simulation of individual data sets and the meaning of individual set simulation arguments, see simulateDatExpr. This function simulates several data sets at a time and puts the result in a multi-set format. The number of genes is the same for all data sets. Module memberships are also the same, but modules can optionally be “dissolved”, that is their genes will be simulated as unassigned. Such “dissolved”, or left out, modules can be specified in the matrix leaveOut.

Value

A list with the following components:

`multiExpr`	simulated expression data in multi-set format analogous to that of the input `eigengenes`. A list with one component per set. Each component is again a list that must contains a component `data`. This is a data frame of expression data for the corresponding data set. Columns correspond to genes, rows to samples.
`setLabels`	a matrix of dimensions (number of genes) times (number of sets) that contains module labels for each genes in each simulated data set.
`allLabels`	a matrix of dimensions (number of genes) times (number of sets) that contains the module labels that would be simulated if no module were left out using `leaveOut`. This means that all columns of the matrix are equal; the columns are repeated for convenience so `allLabels` has the same dimensions as `setLabels`.
`labelOrder`	a matrix of dimensions (number of modules) times (number of sets) that contains the order in which module labels were assigned to genes in each set. The first label is assigned to genes 1...(module size of module labeled by first label), the second label to the following batch of genes etc.

Author(s)

Peter Langfelder

References

A short description of the simulation method can also be found in the Supplementary Material to the article

Langfelder P, Horvath S (2007) Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology 2007, 1:54.

The material is posted at http://horvath.genetics.ucla.edu/html/CoexpressionNetwork/EigengeneNetwork/SupplementSimulations.pdf.