example-data {NetRep} | R Documentation |
Example data
Description
Example gene coexpression networks inferred from two independent datasets to demonstrate the usage of package functions.
Usage
data("NetRep")
Format
- "discovery_network"
-
a
matrix
with 150 columns and 150 rows containing the network edge weights encoding the interaction strength between each pair of genes in the discovery dataset. - "discovery_data"
-
a
matrix
with 150 columns (genes) and 30 rows (samples) whose entries correspond to the expression level of each gene in each sample in the discovery dataset. - "discovery_correlation"
-
a
matrix
with 150 columns and 150 rows containing the correlation-coefficients between each pair of genes calculated from the "discovery_data"matrix
. - \"module_labels"
-
a named
vector
with 150 entries containing the module assignment for each gene as identified in the discovery dataset. - "test_network"
-
a
matrix
with 150 columns and 150 rows containing the network edge weights encoding the interaction strength between each pair of genes in the test dataset. - "test_data"
-
a
matrix
with 150 columns (genes) and 30 rows (samples) whose entries correspond to the expression level of each gene in each sample in the test dataset. - "test_correlation"
-
a
matrix
with 150 columns and 150 rows containing the correlation-coefficients between each pair of genes calculated from the "test_data"matrix
.
An object of class matrix
(inherits from array
) with 150 rows and 150 columns.
An object of class matrix
(inherits from array
) with 30 rows and 150 columns.
An object of class matrix
(inherits from array
) with 150 rows and 150 columns.
An object of class numeric
of length 150.
An object of class matrix
(inherits from array
) with 150 rows and 150 columns.
An object of class matrix
(inherits from array
) with 30 rows and 150 columns.
An object of class matrix
(inherits from array
) with 150 rows and 150 columns.
Details
The preservation of network modules in a second
dataset is quantified by measuring the preservation of topological
properties between the discovery and test datasets. These
properties are calculated not only from the interaction networks inferred
in each dataset, but also from the data used to infer those networks (e.g.
gene expression data) as well as the correlation structure between
variables/nodes. Thus, all functions in the NetRep
package have the
following arguments:
network
:-
a list of interaction networks, one for each dataset.
data
:-
a list of data matrices used to infer those networks, one for each dataset.
correlation
:-
a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.
moduleAssignments
:-
a list of vectors, one for each discovery dataset, containing the module assignments for each node in that dataset.
modules
:-
a list of vectors, one vector for each discovery dataset, containing the names of the modules from that dataset to analyse.
discovery
:-
a vector indicating the names or indices of the previous arguments' lists to use as the discovery dataset(s) for the analyses.
test
:-
a list of vectors, one vector for each discovery dataset, containing the names or indices of the
network
,data
, andcorrelation
argument lists to use as the test dataset(s) for the analysis of each discovery dataset.
This data is used to provide concrete examples of the usage of these arguments in each package function.
Simulation details
The discovery gene expression dataset ("discovery_data"
)
containing 30 samples and 150 genes was simulated to contain four distinct
modules of sizes 20, 25, 30, and 35 genes. Data for each module were
simulated as:
G^{(w)}_{simulated} = E^{(w)} r_i + \sqrt{1 - r^2_i} \epsilon
Where E^{(w)}
is the simulated module's summary vector,
r
is the simulated module's node contributions for each gene,
and \epsilon
is the error term drawn from a standard normal
distribution. E^{(w)}
and r
were simulated by bootstrapping
(sampling with replacement) samples and genes from the corresponding
vectors in modules 63, 51, 57, and 50 discovered in the liver tissue gene
expression data from a
publicly
available mouse dataset (see reference (1) for details on the
dataset and network discovery). The remaining 40 genes that were not part
of any module were simulated by randomly selecting 40 liver genes and
bootstrapping 30 samples and adding the noise term, \epsilon
. A
vector of module assignments was created ("module_labels") in which
each gene was labelled with a number 1-4 corresponding to the module they
were simulated to be coexpressed with, or a label of 0 for the for the 40
"background" genes not participating in any module. The correlation
structure ("discovery_correlation") was calculated as the Pearson's
correlation coefficient between genes
(cor(discovery_data)). Edge weights in the
interaction network ("discovery_network") were calculated as the
absolute value of the correlation coefficient exponentiated to the power 5
(abs(discovery_correlation)^5).
An independent test dataset ("test_data") containing the same 150
genes as the discovery dataset but 30 different samples was
simulated as above. Modules 1 and 4 (containing 20 and 35 genes
respectively) were simulated to be preserved using the same equation
above, where the summary vector E^{(w)}
was bootstrapped from
the same liver modules (modules 63 and 50) as in the discovery and
with identical node contributions r
as in the
discovery dataset. Genes in modules 2 and 3 were simulated as
"background" genes, i.e. not preserved as described above. The
correlation structure between genes in the test dataset
("test_correlation") and the interaction network
("test_network") were calculated the same way as in the
discovery dataset.
The random seed used for the simulations was 37.
References
-
Ritchie, S.C., et al., A scalable permutation approach reveals replication and preservation patterns of network modules in large datasets. Cell Systems. 3, 71-82 (2016).
See Also
modulePreservation
, plotModule
, and
networkProperties
.