R: Compute Cluster-Level Network Properties

addClusterStats {NAIR}

R Documentation

Compute Cluster-Level Network Properties

Description

Given a list of network objects returned by buildRepSeqNetwork() or generateNetworkObjects(), computes cluster-level network properties, performing clustering first if needed. The list of network objects is returned with the cluster properties added as a data frame.

Usage

addClusterStats(
  net,
  cluster_id_name = "cluster_id",
  seq_col = NULL,
  count_col = NULL,
  degree_col = "degree",
  cluster_fun = "fast_greedy",
  overwrite = FALSE,
  verbose = FALSE,
  ...
)

Arguments

`net`	A `list` of network objects conforming to the output of `buildRepSeqNetwork()` or `generateNetworkObjects()`. See details.
`cluster_id_name`	A character string specifying the name of the cluster membership variable in `net$node_data` that identifies the cluster to which each node belongs. If the variable does not exist, it will be added by calling `addClusterMembership()`. If the variable does exist, its values will be used unless `overwrite = TRUE`, in which case its values will be overwritten and the new values used.
`seq_col`	Specifies the column(s) of `net$node_data` containing the receptor sequences upon whose similarity the network is based. Accepts a character or numeric vector of length 1 or 2, containing either column names or column indices. If provided, related cluster-level properties will be computed. The default `NULL` will use the value contained in `net$details$seq_col` if it exists and is valid.
`count_col`	Specifies the column of `net$node_data` containing a measure of abundance (such as clone count or UMI count). Accepts a character string containing the column name or a numeric scalar containing the column index. If provided, related cluster-level properties will be computed.
`degree_col`	Specifies the column of `net$node_data` containing the network degree of each node. Accepts a character string containing the column name. If the column does not exist, it will be added.
`cluster_fun`	A character string specifying the clustering algorithm to use when adding or overwriting the cluster membership variable in `net$node_data` specified by `cluster_id_name`. Passed to `addClusterMembership()`.
`overwrite`	Logical. If `TRUE` and `net` already contains an element named `cluster_data`, it will be overwritten. Similarly, if `overwrite = TRUE` and `net$node_data` contains a variable whose name matches the value of `cluster_id_name`, then its values will be overwritten with new cluster membership values (obtained using `addClusterMembership()` with the specified value of `cluster_fun`), and cluster properties will be computed based on the new values.
`verbose`	Logical. If `TRUE`, generates messages about the tasks performed and their progress, as well as relevant properties of intermediate outputs. Messages are sent to `stderr()`.
`...`	Named optional arguments to the function specified by `cluster_fun`.

Details

The list net must contain the named elements igraph (of class igraph), adjacency_matrix (a matrix or dgCMatrix encoding edge connections), and node_data (a data.frame containing node metadata), all corresponding to the same network. The lists returned by buildRepSeqNetwork() and generateNetworkObjects() are examples of valid inputs for the net argument.

If the network graph has previously been partitioned into clusters using addClusterMembership() and the user wishes to compute network properties for these clusters, the name of the cluster membership variable in net$node_data should be provided to the cluster_id_name argument.

If the value of cluster_id_name is not the name of a variable in net$node_data, then clustering is performed using addClusterMembership() with the specified value of cluster_fun, and the cluster membership values are written to net$node_data using the value of cluster_id_name as the variable name. If overwrite = TRUE, this is done even if this variable already exists.

Value

A modified copy of net, with cluster properties contained in the element cluster_data. This is a data.frame containing one row for each cluster in the network and the following variables:

`cluster_id`	The cluster ID number.
`node_count`	The number of nodes in the cluster.
`mean_seq_length`	The mean sequence length in the cluster. Only present when `length(seq_col) == 1`.
`A_mean_seq_length`	The mean first sequence length in the cluster. Only present when `length(seq_col) == 2`.
`B_mean_seq_length`	The mean second sequence length in the cluster. Only present when `length(seq_col) == 2`.
`mean_degree`	The mean network degree in the cluster.
`max_degree`	The maximum network degree in the cluster.
`seq_w_max_degree`	The receptor sequence possessing the maximum degree within the cluster. Only present when `length(seq_col) == 1`.
`A_seq_w_max_degree`	The first sequence of the node possessing the maximum degree within the cluster. Only present when `length(seq_col) == 2`.
`B_seq_w_max_degree`	The second sequence of the node possessing the maximum degree within the cluster. Only present when `length(seq_col) == 2`.
`agg_count`	The aggregate count among all nodes in the cluster (based on the counts in `count_col`).
`max_count`	The maximum count among all nodes in the cluster (based on the counts in `count_col`).
`seq_w_max_count`	The receptor sequence possessing the maximum count within the cluster. Only present when `length(seq_col) == 1`.
`A_seq_w_max_count`	The first sequence of the node possessing the maximum count within the cluster. Only present when `length(seq_col) == 2`.
`B_seq_w_max_count`	The second sequence of the node possessing the maximum count within the cluster. Only present when `length(seq_col) == 2`.
`diameter_length`	The longest geodesic distance in the cluster, computed as the length of the vector returned by `get_diameter()`.
`assortativity`	The assortativity coefficient of the cluster's graph, based on the degree (minus one) of each node in the cluster (with the degree computed based only upon the nodes within the cluster). Computed using `assortativity_degree()`.
`global_transitivity`	The transitivity (i.e., clustering coefficient) for the cluster's graph, which estimates the probability that adjacent vertices are connected. Computed using `transitivity()` with `type = "global"`.
`edge_density`	The number of edges in the cluster as a fraction of the maximum possible number of edges. Computed using `edge_density()`.
`degree_centrality_index`	The centrality index of the cluster's graph based on within-cluster network degree. Computed as the `centralization` element of the output from `centr_degree()`.
`closeness_centrality_index`	The centrality index of the cluster's graph based on closeness, i.e., distance to other nodes in the cluster. Computed using `centralization()`.
`eigen_centrality_index`	The centrality index of the cluster's graph based on the eigenvector centrality scores, i.e., values of the first eigenvector of the adjacency matrix for the cluster. Computed as the `centralization` element of the output from `centr_eigen()`.
`eigen_centrality_eigenvalue`	The eigenvalue corresponding to the first eigenvector of the adjacency matrix for the cluster. Computed as the `value` element of the output from `eigen_centrality()`.

If net$node_data did not previously contain a variable whose name matches the value of cluster_id_name, then this variable will be present and will contain values for cluster membership, obtained through a call to addClusterMembership() using the clustering algorithm specified by cluster_fun.

If net$node_data did previously contain a variable whose name matches the value of cluster_id_name and overwrite = TRUE, then the values of this variable will be overwritten with new values for cluster membership, obtained as above based on cluster_fun.

If net$node_data did not previously contain a variable whose name matches the value of degree_col, then this variable will be present and will contain values for network degree.

Additionally, if net contains a list named details, then the following elements will be added to net$details, or overwritten if they already exist:

`cluster_data_goes_with`	A character string containing the value of `cluster_id_name`. When `net$node_data` contains multiple cluster membership variables (e.g., from applying different clustering methods), `cluster_data_goes_with` allows the user to distinguish which of these variables corresponds to `net$cluster_data`.
`count_col_for_cluster_data`	A character string containing the value of `count_col`. If `net$node_data` contains multiple count variables, this allows the user to distinguish which of these variables corresponds to the count-related properties in `net$cluster_data`, such as `max_count`. If `count_col = NULL`, then the value will be `NA`.

Author(s)

Brian Neal (Brian.Neal@ucsf.edu)

References

Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825

Webpage for the NAIR package

Examples

set.seed(42)
toy_data <- simulateToyData()

net <- generateNetworkObjects(
  toy_data, "CloneSeq"
)

net <- addClusterStats(
  net,
  count_col = "CloneCount"
)

head(net$cluster_data)
net$details

# won't change net since net$cluster_data exists
net <- addClusterStats(
  net,
  count_col = "CloneCount",
  cluster_fun = "leiden",
  verbose = TRUE
)

# overwrites values in net$cluster_data
# and cluster membership values in net$node_data$cluster_id
# with values obtained using "cluster_leiden" algorithm
net <- addClusterStats(
  net,
  count_col = "CloneCount",
  cluster_fun = "leiden",
  overwrite = TRUE
)

net$details

# overwrites existing values in net$cluster_data
# with values obtained using "cluster_louvain" algorithm
# saves cluster membership values to net$node_data$cluster_id_louvain
# (net$node_data$cluster_id retains membership values from "cluster_leiden")
net <- addClusterStats(
  net,
  count_col = "CloneCount",
  cluster_fun = "louvain",
  cluster_id_name = "cluster_id_louvain",
  overwrite = TRUE
)

net$details

# perform clustering using "cluster_fast_greedy" algorithm,
# save cluster membership values to net$node_data$cluster_id_greedy
net <- addClusterMembership(
  net,
  cluster_fun = "fast_greedy",
  cluster_id_name = "cluster_id_greedy"
)

# compute cluster properties for the clusters from previous step
# overwrites values in net$cluster_data
net <- addClusterStats(
  net,
  cluster_id_name = "cluster_id_greedy",
  overwrite = TRUE
)

net$details

[Package NAIR version 1.0.4 Index]