prepareAdjMat {netgsa} | R Documentation |
Construct adjacency matrices from graphite databases and/or user provided network information
Description
Read the network information from any of the graphite databases specified by the user and construct the adjacency matrices needed for NetGSA. This function also allows for clustering. See details for more information
Usage
prepareAdjMat(x, group, databases = NULL, cluster = TRUE,
file_e=NULL, file_ne=NULL, lambda_c=1, penalize_diag=TRUE, eta=0.5)
Arguments
x |
The |
group |
Vector of class indicators of length |
databases |
(Optional) Either (1) the result of a call to |
cluster |
(Optional) Logical indicating whether or not to cluster genes to estimate adjacency matrix. If not specified, set to TRUE if there are > 2,500 genes (p > 2,500). The main use of clustering is to speed up calculation time. If the dimension of the problem, or equivalently the total number of unique genes across all pathways, is large, If clustering is set to TRUE, the 0-1 adjacency matrix is used to detect clusters of genes within the connected components. Once gene clusterings are chosen, the weighted adjacency matrices are estimated for each cluster separately using If clustering is set to FALSE, the 0-1 adjacency matrix is used to detect connected components and the weighted adjacency matrices are estimated for each connected component. Singleton clusters are combined into one cluster. This should not affect performance much since the gene in a singleton cluster should not have any edges to other genes. |
file_e |
(Optional) The name of the file which the list of edges is to read from. This file is read in with
This information cannot conflict with the user specified non-edges. That is, one cannot have the same edge in |
file_ne |
(Optional) The name of the file which the list of non-edges is to read from. This file is read in with In the case of conflicting information between |
lambda_c |
(Non-negative) a vector or constant. |
penalize_diag |
Logical. Whether or not to penalize diagonal entries when estimating weighted adjacency matrix. If TRUE a small penalty is used, otherwise no penalty is used. |
eta |
(Non-negative) a small constant needed for estimating the edge weights. By default, |
Details
The function prepareAdjMat
accepts both network information from user specified sources as well as a list of graphite databases to search for edges in. prepareAdjMat
calculates the 0-1 adjacency matrices and runs netEst.undir
or netEst.dir
if the graph is undirected or directed.
When searching for network information, prepareAdjMat
makes some important assumptions about edges and non-edges. As already stated, the first is that in the case of conflicting information, user specified non-edges are given precedence.
prepareAdjMat
uses obtainEdgeList
to standardize and search the graphite
databases for edges. For more information see ?obtainEdgeList
. prepareAdjMat
also uses database information to identify non-edges. If two genes are identified in the databases
edges but there is no edge between them this will be coded as a non-edge. The rationale is that if there was an edge between these two genes it would be present.
prepareAdjMat
assumes no information about genes not identified in databases
edgelists. That is, if the user passes gene A, but gene A is not found in any of the edges in databases
no information about Gene A is assumed. Gene A will have neither edges nor non-edges.
Once all the network and clustering information has been compiled, prepareAdjMat
estimates the network. prepareAdjMat
will automatically detect directed graphs, rearrange them to the correct order and use netEst.dir
to estimate the network. When the graph is undirected netEst.undir
will be used. For more information on these methods see ?netEst.dir
and ?netEst.undir
.
Importantly, prepareAdjMat
returns the list of weighted adjacency matrices to be used as an input in NetGSA
.
Value
A list with components
Adj |
A list of weighted adjacency matrices estimated from either |
invcov |
A list of inverse covariance matrices estimated from either |
lambda |
A list of values of tuning parameters used for each condition in |
Author(s)
Michael Hellstern
References
Ma, J., Shojaie, A. & Michailidis, G. (2016) Network-based pathway enrichment analysis with incomplete network information. Bioinformatics 32(20):165–3174.
See Also
NetGSA
, netEst.dir
, netEst.undir
Examples
## load the data
data("breastcancer2012_subset")
## consider genes from just 2 pathways
genenames <- unique(c(pathways[[1]], pathways[[2]]))
sx <- x[match(rownames(x), genenames, nomatch = 0L) > 0L,]
adj_cluster <- prepareAdjMat(sx, group,
databases = c("kegg", "reactome"),
cluster = TRUE)
adj_no_cluster <- prepareAdjMat(sx, group,
databases = c("kegg", "reactome"),
cluster = FALSE)