R: Select columns with the lowest consensus number of missing...

selectFewestConsensusMissing {WGCNA}

R Documentation

Select columns with the lowest consensus number of missing data

Description

Given a multiData structure, this function calculates the consensus number of present (non-missing) data for each variable (column) across the data sets, forms the consensus and for each group selects variables whose consensus proportion of present data is at least selectFewestMissing (see usage below).

Usage

selectFewestConsensusMissing(
    mdx, 
    colID, 
    group, 
    minProportionPresent = 1, 
    consensusQuantile = 0, 
    verbose = 0,
    ...)

Arguments

`mdx`	A `multiData` structure. All sets must have the same columns.
`colID`	Character vector of column identifiers. This must include all the column names from `mdx`, but can include other values as well. Its entries must be unique (no duplicates) and no missing values are permitted.
`group`	Character vector whose components contain the group label (e.g. a character string) for each entry of `colID`. This vector must be of the same length as the vector `colID`. In gene expression applications, this vector could contain the gene symbol (or a co-expression module label).
`minProportionPresent`	A numeric value between 0 and 1 (logical values will be coerced to numeric). Denotes the minimum consensus fraction of present data in each column that will result in the column being retained.
`consensusQuantile`	A number between 0 and 1 giving the quantile probability for consensus calculation. 0 means the minimum value (true consensus) will be used.
`verbose`	Level of verbosity; 0 means silent, larger values will cause progress messages to be printed.
`...`	Other arguments that should be considered undocumented and subject to change.

Details

A 'consensus' of a vector (say 'x') is simply defined as the quantile with probability consensusQuantile of the vector x. This function calculates, for each variable in mdx, its proportion of present (i.e., non-NA and non-NaN) values in each of the data sets in mdx, and forms the consensus. Only variables whose consensus proportion of present data is at least selectFewestMissing are retained.

Value

A logical vector with one element per variable in mdx, giving TRUE for the retained variables.

Author(s)

Jeremy Miller and Peter Langfelder