R: OSLOM community finding

netclu_oslom {bioregion}

R Documentation

OSLOM community finding

Description

This function finds communities in a (un)weighted (un)directed network based on the OSLOM algorithm (http://oslom.org/, version 2.4).

Usage

netclu_oslom(
  net,
  weight = TRUE,
  cut_weight = 0,
  index = names(net)[3],
  seed = NULL,
  reassign = "no",
  r = 10,
  hr = 50,
  t = 0.1,
  cp = 0.5,
  directed = FALSE,
  bipartite = FALSE,
  site_col = 1,
  species_col = 2,
  return_node_type = "both",
  binpath = "tempdir",
  path_temp = "oslom_temp",
  delete_temp = TRUE
)

Arguments

`net`	the output object from `similarity()` or `dissimilarity_to_similarity()`. If a `data.frame` is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the similarity indices.
`weight`	a `boolean` indicating if the weights should be considered if there are more than two columns.
`cut_weight`	a minimal weight value. If `weight` is TRUE, the links between sites with a weight strictly lower than this value will not be considered (O by default).
`index`	name or number of the column to use as weight. By default, the third column name of `net` is used.
`seed`	for the random number generator (NULL for random by default).
`reassign`	a `character` indicating if the nodes belonging to several community should be reassign and what method should be used (see Note).
`r`	the number of runs for the first hierarchical level (10 by default).
`hr`	the number of runs for the higher hierarchical level (50 by default, 0 if you are not interested in hierarchies).
`t`	the p-value, the default value is 0.10, increase this value you to get more modules.
`cp`	kind of resolution parameter used to decide between taking some modules or their union (default value is 0.5, bigger value leads to bigger clusters).
`directed`	a `boolean` indicating if the network is directed (from column 1 to column 2).
`bipartite`	a `boolean` indicating if the network is bipartite (see Details).
`site_col`	name or number for the column of site nodes (i.e. primary nodes).
`species_col`	name or number for the column of species nodes (i.e. feature nodes).
`return_node_type`	a `character` indicating what types of nodes (`site`, `species` or `both`) should be returned in the output (`return_node_type = "both"` by default).
`binpath`	a `character` indicating the path to the bin folder (see install_binaries and Details).
`path_temp`	a `character` indicating the path to the temporary folder (see Details).
`delete_temp`	a `boolean` indicating if the temporary folder should be removed (see Details).

Details

OSLOM is a network community detection algorithm proposed in (Lancichinetti et al. 2011) that finds statistically significant (overlapping) communities in (un)weighted and (un)directed networks.

This function is based on the 2.4 C++ version of OSLOM (http://www.oslom.org/software.htm). This function needs files to run. They can be installed with install_binaries.

If you changed the default path to the bin folder while running install_binaries PLEASE MAKE SURE to set binpath accordingly.

The C++ version of OSLOM generates temporary folders and/or files that are stored in the path_temp folder (folder "oslom_temp" with an unique timestamp located in the bin folder in binpath by default). This temporary folder is removed by default (delete_temp = TRUE).

Value

A list of class bioregion.clusters with five slots:

name: character containing the name of the algorithm
args: list of input arguments as provided by the user
inputs: list of characteristics of the clustering process
algorithm: list of all objects associated with the clustering procedure, such as original cluster objects
clusters: data.frame containing the clustering results

In the algorithm slot, users can find the following elements:

cmd: the command line use to run OSLOM
version: the OSLOM version
web: the OSLOM's web site

Note

Although this algorithm was not primarily designed to deal with bipartite network, it is possible to consider the bipartite network as unipartite network (bipartite = TRUE). Do not forget to indicate which of the first two columns is dedicated to the site nodes (i.e. primary nodes) and species nodes (i.e.feature nodes) using the arguments site_col and species_col. The type of nodes returned in the output can be chosen with the argument return_node_type equal to both to keep both types of nodes, sites to preserve only the sites nodes and species to preserve only the species nodes.

Since OSLOM potentially returns overlapping communities we propose two methods to reassign the 'overlapping' nodes randomly reassign = "random" or based on the closest candidate community reassign = "simil" (only for weighted networks, in this case the closest candidate community is determined with the average similarity). By default reassign = "no" and all the information will be provided. The number of partitions will depend on the number of overlapping modules (up to three). The suffix ⁠_semel⁠, ⁠_bis⁠ and ⁠_ter⁠ are added to the column names. The first partition (⁠_semel⁠) assigns a module to each node. A value of NA in the second (⁠_bis⁠) and third (⁠_ter⁠) columns indicates that no overlapping module were found for this node (i.e. non-overlapping nodes).

Author(s)

Maxime Lenormand (maxime.lenormand@inrae.fr), Pierre Denelle (pierre.denelle@gmail.com) and Boris Leroy (leroy.boris@gmail.com)

References

Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S (2011). “Finding statistically significant communities in networks.” PloS one, 6(4).

Examples

comat <- matrix(sample(1000, 50), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)

net <- similarity(comat, metric = "Simpson")
com <- netclu_oslom(net)

[Package bioregion version 1.1.1 Index]