R: Calculate individual correlation network matrices

individualTOMs {WGCNA}

R Documentation

Calculate individual correlation network matrices

Description

This function calculates correlation network matrices (adjacencies or topological overlaps), after optionally first pre-clustering input data into blocks.

Usage

individualTOMs(
   multiExpr,
   multiWeights = NULL,
   multiExpr.imputed = NULL,  

   # Data checking options
   checkMissingData = TRUE,

   # Blocking options
   blocks = NULL,
   maxBlockSize = 5000,
   blockSizePenaltyPower = 5,
   nPreclusteringCenters = NULL,
   randomSeed = 54321,

   # Network construction options
   networkOptions,

   # Save individual TOMs? 
   saveTOMs = TRUE,
   individualTOMFileNames = "individualTOM-Set%s-Block%b.RData",

   # Behaviour options
   collectGarbage = TRUE,
   verbose = 2, indent = 0)

Arguments

`multiExpr`	expression data in the multi-set format (see `checkSets`). A vector of lists, one per set. Each set must contain a component `data` that contains the expression data, with rows corresponding to samples and columns to genes or probes.
`multiWeights`	optional observation weights in the same format (and dimensions) as `multiExpr`. These weights are used for correlation calculations with data in `multiExpr`.
`multiExpr.imputed`	Optional version of `multiExpr` with missing data imputed. If not given and `multiExpr` contains missing data, they will be imputed using the function `impute.knn`.
`checkMissingData`	logical: should data be checked for excessive numbers of missing entries in genes and samples, and for genes with zero variance? See details.
`blocks`	optional specification of blocks in which hierarchical clustering and module detection should be performed. If given, must be a numeric vector with one entry per gene of `multiExpr` giving the number of the block to which the corresponding gene belongs.
`maxBlockSize`	integer giving maximum block size for module detection. Ignored if `blocks` above is non-NULL. Otherwise, if the number of genes in `datExpr` exceeds `maxBlockSize`, genes will be pre-clustered into blocks whose size should not exceed `maxBlockSize`.
`blockSizePenaltyPower`	number specifying how strongly blocks should be penalized for exceeding the maximum size. Set to a lrge number or `Inf` if not exceeding maximum block size is very important.
`nPreclusteringCenters`	number of centers to be used in the preclustering. Defaults to smaller of `nGenes/20` and `100*nGenes/maxBlockSize`, where `nGenes` is the nunber of genes (variables) in `multiExpr`.
`randomSeed`	integer to be used as seed for the random number generator before the function starts. If a current seed exists, it is saved and restored upon exit. If `NULL` is given, the function will not save and restore the seed.
`networkOptions`	A single list of class `NetworkOptions` giving options for network calculation for all of the networks, or a `multiData` structure containing one such list for each input data set.
`saveTOMs`	logical: should individual TOMs be saved to disk (`TRUE`) or retuned directly in the return value (`FALSE`)?
`individualTOMFileNames`	character string giving the file names to save individual TOMs into. The following tags should be used to make the file names unique for each set and block: `%s` will be replaced by the set number; `%N` will be replaced by the set name (taken from `names(multiExpr)`) if it exists, otherwise by set number; `%b` will be replaced by the block number. If the file names turn out to be non-unique, an error will be generated.
`collectGarbage`	Logical: should garbage collection be called after each block calculation? This can be useful when the data are large, but could unnecessarily slow down calculation with small data.
`verbose`	Integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.
`indent`	Indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.

Details

The function starts by optionally filtering out samples that have too many missing entries and genes that have either too many missing entries or zero variance in at least one set. Genes that are filtered out are excluded from the network calculations.

If blocks is not given and the number of genes (columns) in multiExpr exceeds maxBlockSize, genes are pre-clustered into blocks using the function consensusProjectiveKMeans; otherwise all genes are treated in a single block. Any missing data in multiExpr will be imputed; if imputed data are already available, they can be supplied separately.

For each block of genes, the network adjacency is constructed and (if requested) topological overlap is calculated in each set. The topological overlaps can be saved to disk as RData files, or returned directly within the return value (see below). Note that the matrices can be big and returning them within the return value can quickly exhaust the system's memory. In particular, if the block-wise calculation is necessary, it is usually impossible to return all matrices in the return value.

Value

A list with the following components:

`blockwiseAdjacencies`	A `multiData` structure containing (possibly blockwise) network matrices for each input data set. The network matrices are stored as `BlockwiseData` objects.
`setNames`	A copy of `names(multiExpr)`.
`nSets`	Number of sets in `multiExpr`
`blockInfo`	A list of class `BlockInformation`, giving information about blocks and gene and sample filtering.
`networkOptions`	The input `networkOptions`, returned as a `multiData` structure with one entry per input data set.

Author(s)

Peter Langfelder