u_extract_summary {eclust} | R Documentation |
Calculates cluster summaries
Description
This is a modified version of
moduleEigengenes
. It can extract (1st and 2nd
principal component) of modules in a given single dataset. It can also
return the average, the variance explained This function is more flexible
and the nPC argument is used. currently only nPC = 1 and nPC = 2 are
supported
Usage
u_extract_summary(x_train, colors, x_test, y_train, y_test, impute = TRUE,
nPC, excludeGrey = FALSE, grey = if (is.numeric(colors)) 0 else "grey",
subHubs = TRUE, trapErrors = FALSE, returnValidOnly = trapErrors,
softPower = 6, scale = TRUE, verbose = 0, indent = 0)
Arguments
x_train |
Training data for a single set in the form of a data frame where rows are samples and columns are genes (probes, cpgs, covariates). |
colors |
A vector of the same length as the number of probes in expr, giving module color for all probes (genes). Color "grey" is reserved for unassigned genes. |
x_test |
Test set in the form of a data frame where rows are samples and columns are genes (probes, cpgs, covariates). |
y_train |
Training response numeric vector |
y_test |
Test response numeric vector |
impute |
If TRUE, expression data will be checked for the presence of NA entries and if the latter are present, numerical data will be imputed, using function impute.knn and probes from the same module as the missing datum. The function impute.knn uses a fixed random seed giving repeatable results. |
nPC |
Number of principal components and variance explained entries to be calculated. Note that only 1 or 2 is possible. |
excludeGrey |
Should the improper module consisting of 'grey' genes be excluded from the eigengenes? |
grey |
Value of colors designating the improper module. Note that if colors is a factor of numbers, the default value will be incorrect. |
subHubs |
Controls whether hub genes should be substituted for missing eigengenes. If TRUE, each missing eigengene (i.e., eigengene whose calculation failed and the error was trapped) will be replaced by a weighted average of the most connected hub genes in the corresponding module. If this calculation fails, or if subHubs==FALSE, the value of trapErrors will determine whether the offending module will be removed or whether the function will issue an error and stop. |
trapErrors |
Controls handling of errors from that may arise when there are too many NA entries in expression data. If TRUE, errors from calling these functions will be trapped without abnormal exit. If FALSE, errors will cause the function to stop. Note, however, that subHubs takes precedence in the sense that if subHubs==TRUE and trapErrors==FALSE, an error will be issued only if both the principal component and the hubgene calculations have failed. |
returnValidOnly |
logical; controls whether the returned data frame of module eigengenes contains columns corresponding only to modules whose eigengenes or hub genes could be calculated correctly (TRUE), or whether the data frame should have columns for each of the input color labels (FALSE). |
softPower |
The power used in soft-thresholding the adjacency matrix. Only used when the hubgene approximation is necessary because the principal component calculation failed. It must be non-negative. The default value should only be changed if there is a clear indication that it leads to incorrect results. |
scale |
logical; can be used to turn off scaling of the expression data before calculating the singular value decomposition. The scaling should only be turned off if the data has been scaled previously, in which case the function can run a bit faster. Note however that the function first imputes, then scales the expression data in each module. If the expression contain missing data, scaling outside of the function and letting the function impute missing data may lead to slightly different results than if the data is scaled within the function. |
verbose |
Controls verbosity of printed progress messages. 0 means silent, up to (about) 5 the verbosity gradually increases. |
indent |
A single non-negative integer controlling indentation of printed messages. 0 means no indentation, each unit above that adds two spaces. |
Details
This function is called internally by the
u_cluster_similarity
function
Value
A list with the following components:
- eigengenes
Module eigengenes in a dataframe, with each column corresponding to one eigengene
- averageExpr
the average expression per module in the training set
- averageExprTest
the average expression per module in the training set
- varExplained
The variance explained by the first PC in each module
- validColors
A copy of the input colors with entries corresponding to invalid modules set to grey if given, otherwise 0 if colors is numeric and "grey" otherwise.
- PC
The 1st or 1st and 2nd PC from each module in the training set
- PCTest
The 1st or 1st and 2nd PC from each module in the test set
- prcompObj
The
prcomp
object returned byprcomp
- nclusters
the number of modules (clusters)
References
Zhang, B. and Horvath, S. (2005), "A General Framework for Weighted Gene Co-Expression Network Analysis", Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article 17
Examples
## Not run:
#see u_cluster_similarity for examples
## End(Not run)