graph_modul_compar {graph4lg}R Documentation

Compare the partition into modules of two graphs

Description

The function computes the Adjusted Rand Index (ARI) to compare two graphs' partitions into modules or clusters more generally. Both graphs must have the same number of nodes, but not necessarily the same number of links. They must also have the same node names and in the same order.

Usage

graph_modul_compar(
  x,
  y,
  mode = "graph",
  nb_modul = NULL,
  algo = "fast_greedy",
  node_inter = "distance",
  data = NULL
)

Arguments

x

The first graph object

  • If mode = 'graph' (default), x is a graph object of class igraph. Then, its nodes must have the same names as in graph y.

  • If mode = 'data.frame', x refers to a column of the data.frame 'data'. Then x must be a character string indicating the name of the column of 'data' with the modules' labels of the nodes in the first graph. In that case, the column can be of class numeric, character or factor but will be converted into a numeric vector in any case.

  • If mode = 'vector', x is a vector of class character, factor or numeric. In that case, it must have the same length as vector y and will be converted into a numeric vector.

y

The second graph object Same classes possible as for x. Must be of the same format as x

mode

A character string indicating whether x and y are igraph objects, vectors or columns from a data.frame. mode can be 'graph', 'data.frame' or 'vector'.

nb_modul

(if x and y are igraph objects) A numeric or integer value or a numeric vector with 2 elements indicating the number of modules to create in both graphs.

  • If nb_modul is a numeric value, then the same number of modules are created in both graphs.

  • If nb_modul is a numeric vector of length 2, then the numbers of modules created in graphs x and y are the first and second elements of nb_modul, respectively.

algo

(if x and y are igraph objects) A character string indicating the algorithm used to create the modules with igraph.

  • If algo = 'fast_greedy' (default), function cluster_fast_greedy from igraph is used (Clauset et al., 2004).

  • If algo = 'walktrap' (default), function cluster_walktrap from igraph is used (Pons et Latapy, 2006) with 4 steps (default options).

  • If algo = 'louvain', function cluster_louvain from igraph is used (Blondel et al., 2008). In that case, the number of modules created in each graph is imposed.

  • If algo = 'optimal', function cluster_optimal from igraph is used (Brandes et al., 2008) (can be very long). In that case, the number of modules created in each graph is imposed.

node_inter

(optional, if x and y are igraph objects, default is 'none') A character string indicating whether the links of the graph are weighted by distances or by similarity indices. It is only used to compute the modularity index. It can be:

  • 'distance': Link weights correspond to distances. Nodes that are close to each other will more likely be in the same module.

  • 'similarity': Link weights correspond to similarity indices. Nodes that are similar to each other will more likely be in the same module. Inverse link weights are then used to compute the modularity index.

  • 'none': Links are not weighted for the computation, which is only based on graph topology.

Two different weightings can be used to create the modules of the two graphs.

  • If node_inter is a character string, then the same link weighting is used for both graphs.

  • If node_inter is a character vector of length 2, then the link weighting used by the algorithm to create the modules of graphs x and y is determined by the first and second elements of node_inter, respectively.

data

(if x and y are columns from a data.frame) An object of class data.frame with at least two columns and as many rows as there are nodes in the graphs compared. The columns indicate the modules of each node in 2 different classifications.

Details

This index takes values between -1 and 1. It measures how often pairs of nodes pertaining to the same module in one graph also pertain to the same module in the other graph. Therefore, large values indicate that both partitions are similar. The Rand Index can be defined as the frequency of agreement between two classifications into discrete classes. It is the number of times a pair of elements are classified into the same class or in two different classes in both compared classifications, divided by the total number of possible pairs of elements. The Rand Index is between 0 and 1 but its maximum value depends on the number of elements. Thus, another 'adjusted' index was created, the Adjusted Rand Index. According to the Hubert et Arabie's formula, the ARI is computed as follows: ARI=\frac{Index - Expected index}{Maximum index - Expected index} where the values of Index, Expected index and Maximum index are computed from a contingency table. This function uses adjustedRandIndex from package mclust which applies the Hubert and Arabie's formula for the ARI. This function works for undirected graphs only.

Value

The value of the ARI

Author(s)

P. Savary

References

Dyer RJ, Nason JD (2004). “Population graphs: the graph theoretic shape of genetic structure.” Molecular ecology, 13(7), 1713–1727. Hubert L, Arabie P (1985). “Comparing partitions.” Journal of classification, 2(1), 193–218. Clauset A, Newman ME, Moore C (2004). “Finding community structure in very large networks.” Physical review E, 70(6). Blondel VD, Guillaume J, Lambiotte R, Lefebvre E (2008). “Fast unfolding of communities in large networks.” Journal of Statistical Mechanics - Theory and Experiment, 10. Brandes U, Delling D, Gaertler M, Gorke R, Hoefer M, Nikoloski Z, Wagner D (2008). “On modularity clustering.” IEEE transactions on knowledge and data engineering, 20(2), 172–188. Pons P, Latapy M (2006). “Computing communities in large networks using random walks.” J. Graph Algorithms Appl., 10(2), 191–218.

Examples

data(data_ex_genind)
data(pts_pop_ex)
mat_dist <- suppressWarnings(graph4lg::mat_geo_dist(data=pts_pop_ex,
      ID = "ID",
      x = "x",
      y = "y"))
mat_dist <- mat_dist[order(as.character(row.names(mat_dist))),
                      order(as.character(colnames(mat_dist)))]
graph_obs <- gen_graph_thr(mat_w = mat_dist, mat_thr = mat_dist,
                            thr = 24000, mode = "larger")
mat_gen <- mat_gen_dist(x = data_ex_genind, dist = "DPS")
graph_pred <- gen_graph_topo(mat_w = mat_gen, mat_topo = mat_dist,
                            topo = "gabriel")
ARI <- graph_modul_compar(x = graph_obs, y = graph_pred)

[Package graph4lg version 1.8.0 Index]