graph_two_GOspecies {GOCompare}R Documentation

Undirected network representation for the results of functional enrichment analysis to compare two species and a series of categories

Description

graph_two_GOspecies is a function to create undirected graphs

The graph_two_GOspecies is an analog of the graphGOspecies function, and it has the same options (" Categories " and " GO "). Nevertheless, the way in which the edge and node weights are calculated is slightly different. Since two species are compared, three possible graphs are available \({G}_1,\, {G}_2\), and \({G}_3 \). \({G}_1\), and \({G}_2 \) represent each of the species analyzed and \({G}_3\) is a subgraph of \({G}_1,\, {G}_2\), which contains the GO terms or Categories co-ocurring between both species.

Categories option: (Weight): The nodes \((V)\) represent groups of gene lists (categories), and the edges \((E)\) represent GO terms co-occurring between pairs of categories and the weight of the nodes provides a measure of how a GO term is conserved between two species and a series of categories but it is biased to categories.

\[\widehat{K}_w(u)=\sum_{v \epsilon V_1}^{}w(u,v) + \sum_{v \epsilon V_2}^{}w(u,v)\]

(5)

(shared weight): The nodes \((V)\) represent groups of gene lists (categories), and the edges \((E)\) represent GO terms co-occurring between pairs of categories that are only shared between species. This node weight \({K}_s\) is computed from a shared weight of edges \({s}\), where \({N}1\) and \({N}2\) are the set of GO terms associated with the edge \(e = (u,v) \) for species 1 and 2, respectively. Therefore the node shared weight \({K}_s(u)\) is the sum of \({s}\).

\[s(e) = \frac{\mid {N1} \ n \ {N2} \mid}{\mid {N1} \bigcup {N2} \mid}\]

(6)

\[{K}_s(u)=\sum_{v \epsilon (V_1 \bigcup V_2) }^{}{s(u,v)}\]

(7)

(combined weight): This node weight \({K}_c(u)\) is a combination of the weight and the shared weight. The idea of this combined weight is to find categories with more frequent GO terms co-ocurring in order to observe functional similarities between two species with a balance of GO terms co-occurring among gene lists (categories) and the two species. This node weight varies from -1 (categories with GO terms found only in one species and few categories) to 1 (categories with GO terms shared widely between species and among other categories). the combined node weight \({K}_c\) is defined as the sum of the min-max normalized weights \(\widehat{K}_w\) and \({K}_s\) minus 1.

\[minmax(y)=\frac{y-min(y)}{max(y)-min(y)}\]

(8) \[{K}_c(u)= minmax(\widehat{K}_w(u)) + minmax({K}_s(u)) - 1 \] (9)

GO option: Given there are three possible graphs are available \({G}_1,\, {G}_2\), and \({G}_3\). \({G}_1\), and \({G}_2\) represent each of the species analyzed and \({G}_3\) is a subgraph of \({G}_1,\, {G}_2\), which contains the GO terms or Categories co-ocurring between both species. For this case, Nodes are GO terms and edges are categories where a GO terms is co-ocurring. This weight is similar to the GO weight calculated for graphGOspecies function. it is calculated as the equation 5.

\[\widehat{K}_w(u)=\sum_{v \epsilon V_1}^{}w(u,v) + \sum_{v \epsilon V_2}^{}w(u,v)\]

(5)

Usage

graph_two_GOspecies(
  x,
  species1,
  species2,
  GOterm_field,
  saveGraph = FALSE,
  option = "Categories",
  numCores = 2,
  outdir = NULL,
  filename = NULL
)

Arguments

x

is a list obtained as output of the comparegOspecies function

species1

This is a string with the species name for species 1 (e.g; "H. sapiens")

species2

This is a string with the species name for species 2 (e.g; "A. thaliana")

GOterm_field

This is a string with the column name of the GO terms (e.g; "Functional_Category")

saveGraph

logical, if TRUE the function will allow save the graph in graphml format

option

(values: "Categories or "GO"). This option allows create either a graph where nodes are GO terms and edges are features and GO as well as species belonging are edges attributes or a graph where nodes are GO terms and edges are species belonging (default value="Categories")

numCores

numeric, Number of cores to use for the process (default value numCores=2). For the example below, only one core will be used

outdir

This parameter will allow save the graph file in a folder described here (e.g: "D:").This parameter only works when saveGraph=TRUE

filename

The name of the graph filename to be saved in the outdir detailed by the user.This parameter only works when saveGraph=TRUE

Value

This function will return a list with two slots: edges and nodes. (Categories): Edges list columns:

Column Description
SOURCE and TARGET The source and target categories (Nodes in the edge)
GO_N The number of GO terms between the categories
WEIGHT Edge weight
GO GO terms available for both nodes
SP1 Number of GO terms for the species 1
SP2 Number of GO terms for the species 2
SHARED Number of GO terms shared or co-ocurring between the categories
SHARED_WEIGHT Shared weight for the edge

Node list columns:

Column Description
CAT Category name
CAT_WEIGHT Node weight
SHARED_WEIGHT Shared weight for the node
COMBINED_WEIGHT Combined weight for the node

(GO):

Edges list columns:

Column Description
SOURCE and TARGET The source and target GO terms (Nodes in the edge)
FEATURE The number of Categories where both GO Terms were found
SP Species where the GO terms was found (Species 1, Species 2 or Shared)
WEIGHT Edge weight

Node list columns:

Column Description
GO GO term node name
GO_WEIGHT Node weight

Examples


GOterm_field <- "Functional_Category"
data(comparison_ex_compress_CH)
#Defining the species names
species1 <- "H. sapiens"
species2 <- "A. thaliana"
x_graph <- graph_two_GOspecies(x=comparison_ex_compress_CH,
          species1=species1,
          species2=species2,
          GOterm_field=GOterm_field,
          numCores=1,
          saveGraph = FALSE,
          option= "Categories",
          outdir = NULL,
          filename= NULL)


[Package GOCompare version 1.0.2.1 Index]