R: Calculations of network concepts

networkConcepts {WGCNA}

R Documentation

Calculations of network concepts

Description

This functions calculates various network concepts (topological properties, network indices) of a network calculated from expression data. See details for a detailed description.

Usage

networkConcepts(datExpr, power = 1, trait = NULL, networkType = "unsigned")

Arguments

`datExpr`	a data frame containg the expression data, with rows corresponding to samples and columns to genes (nodes).
`power`	soft thresholding power.
`trait`	optional specification of a sample trait. A vector of length equal the number of samples in `datExpr`.
`networkType`	network type. Recognized values are (unique abbreviations of) `"unsigned"`, `"signed"`, and `"signed hybrid"`.

Details

This function computes various network concepts (also known as network statistics, topological properties, or network indices) for a weighted correlation network. The nodes of the weighted correlation network will be constructed between the columns (interpreted as nodes) of the input datExpr. If the option networkType="unsigned" then the adjacency between nodes i and j is defined as A[i,j]=abs(cor(datExpr[,i],datExpr[,j]))^power. In the following, we use the term gene and node interchangeably since these methods were originally developed for gene networks. The function computes the following 4 types of network concepts (introduced in Horvath and Dong 2008):

Type I: fundamental network concepts are defined as a function of the off-diagonal elements of an adjacency matrix A and/or a node significance measure GS. These network concepts can be defined for any network (not just correlation networks). The adjacency matrix of an unsigned weighted correlation network is given by A=abs(cor(datExpr,use="p"))^power and the trait based gene significance measure is given by GS= abs(cor(datExpr,trait, use="p"))^power where datExpr, trait, power are input parameters.

Type II: conformity-based network concepts are functions of the off-diagonal elements of the conformity based adjacency matrix A.CF=CF*t(CF) and/or the node significance measure. These network concepts are defined for any network for which a conformity vector can be defined. Details: For any adjacency matrix A, the conformity vector CF is calculated by requiring that A[i,j] is approximately equal to CF[i]*CF[j]. Using the conformity one can define the matrix A.CF=CF*t(CF) which is the outer product of the conformity vector with itself. In general, A.CF is not an adjacency matrix since its diagonal elements are different from 1. If the off-diagonal elements of A.CF are similar to those of A according to the Frobenius matrix norm, then A is approximately factorizable. To measure the factorizability of a network, one can calculate the Factorizability, which is a number between 0 and 1 (Dong and Horvath 2007). T he conformity is defined using a monotonic, iterative algorithm that maximizes the factorizability measure.

Type III: approximate conformity based network concepts are functions of all elements of the conformity based adjacency matrix A.CF (including the diagonal) and/or the node significance measure GS. These network concepts are very useful for deriving relationships between network concepts in networks that are approximately factorizable.

Type IV: eigengene-based (also known as eigennode-based) network concepts are functions of the eigengene-based adjacency matrix A.E=ConformityE*t(ConformityE) (diagonal included) and/or the corresponding eigengene-based gene significance measure GSE. These network concepts can only be defined for correlation networks. Details: The columns (nodes) of datExpr can be summarized with the first principal component, which is referred to as Eigengene in coexpression network analysis. In general correlation networks, it is called eigennode. The eigengene-based conformity ConformityE[i] is defined as abs(cor(datE[,i], Eigengene))^power where the power corresponds to the power used for defining the weighted adjacency matrix A. The eigengene-based conformity can also be used to define an eigengene-based adjacency matrix A.E=ConformityE*t(ConformityE). The eigengene based factorizability EF(datE) is a number between 0 and 1 that measures how well A.E approximates A when the power parameter equals 1. EF(datE) is defined with respect to the singular values of datExpr. For a trait based node significance measure GS=abs(cor(datE,trait))^power, one can also define an eigengene-based node significance measure GSE[i]=ConformityE[i]*EigengeneSignificance where the eigengene significance abs(cor(Eigengene,trait))^power is defined as power of the absolute value of the correlation between eigengene and trait. Eigengene-based network concepts are very useful for providing a geometric interpretation of network concepts and for deriving relationships between network concepts. For example, the hub gene significance measure and its eigengene-based analog have been used to characterize networks where highly connected hub genes are important with regard to a trait based gene significance measure (Horvath and Dong 2008).

Value

A list with the following components:

`Summary`	a data frame whose rows report network concepts that only depend on the adjacency matrix. Density (mean adjacency), Centralization , Heterogeneity (coefficient of variation of the connectivity), Mean ClusterCoef, Mean Connectivity. The columns of the data frame report the 4 types of network concepts mentioned in the description: Fundamental concepts, eigengene-based concepts, conformity-based concepts, and approximate conformity-based concepts.
`Size`	reports the network size, i.e. the number of nodes, which equals the number of columns of the input data frame `datExpr`.
`Factorizability`	a number between 0 and 1. The closer it is to 1, the better the off-diagonal elements of the conformity based network `A.CF` approximate those of `A` (according to the Frobenius norm).
`Eigengene`	the first principal component of the standardized columns of `datExpr`. The number of components of this vector equals the number of rows of `datExpr`.
`VarExplained`	the proportion of variance explained by the first principal component (the `Eigengene`). It is numerically different from the eigengene based factorizability. While `VarExplained` is based on the squares of the singular values of `datExpr`, the eigengene-based factorizability is based on fourth powers of the singular values.
`Conformity`	numerical vector giving the conformity. The number of components of the conformity vector equals the number of columns in `datExpr`. The conformity is often highly correlated with the vector of node connectivities. The conformity is computed using an iterative algorithm for maximizing the factorizability measure. The algorithm and related network concepts are described in Dong and Horvath 2007.
`ClusterCoef`	a numerical vector that reports the cluster coefficient for each node. This fundamental network concept measures the cliquishness of each node.
`Connectivity`	a numerical vector that reports the connectivity (also known as degree) of each node. This fundamental network concept is also known as whole network connectivity. One can also define the scaled connectivity `K=Connectivity/max(Connectivity)` which is used for computing the hub gene significance.
`MAR`	a numerical vector that reports the maximum adjacency ratio for each node. `MAR[i]` equals 1 if all non-zero adjacencies between node `i` and the remaining network nodes equal 1. This fundamental network concept is always 1 for nodes of an unweighted network. This is a useful measure for weighted networks since it allows one to determine whether a node has high connectivity because of many weak connections (small MAR) or because of strong (but few) connections (high MAR), see Horvath and Dong 2008.
`ConformityE`	a numerical vector that reports the eigengene based (aka eigenenode based) conformity for the correlation network. The number of components equals the number of columns of `datExpr`.
`GS`	a numerical vector that encodes the node (gene) significance. The i-th component equals the node significance of the i-th column of `datExpr` if a sample trait was supplied to the function (input trait). `GS[i]=abs(cor(datE[,i], trait, use="p"))^power`
`GSE`	a numerical vector that reports the eigengene based gene significance measure. Its i-th component is given by `GSE[i]=ConformityE[i]*EigengeneSignificance` where the eigengene significance `abs(cor(Eigengene,trait))^power` is defined as power of the absolute value of the correlation between eigengene and trait.
`Significance`	a data frame whose rows report network concepts that also depend on the trait based node significance measure. The rows correspond to network concepts and the columns correspond to the type of network concept (fundamental versus eigengene based). The first row of the data frame reports the network significance. The fundamental version of this network concepts is the average gene significance=mean(GS). The eigengene based analog of this concept is defined as mean(GSE). The second row reports the hub gene significance which is defined as slope of the intercept only regression model that regresses the gene significance on the scaled network connectivity K. The third row reports the eigengene significance `abs(cor(Eigengene,trait))^power`. More details can be found in Horvath and Dong (2008).

Author(s)

Jun Dong, Steve Horvath, Peter Langfelder

References

Bin Zhang and Steve Horvath (2005) "A General Framework for Weighted Gene Co-Expression Network Analysis", Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article 17

Dong J, Horvath S (2007) Understanding Network Concepts in Modules, BMC Systems Biology 2007, 1:24

Horvath S, Dong J (2008) Geometric Interpretation of Gene Coexpression Network Analysis. PLoS Comput Biol 4(8): e1000117