R: Compute Graph Adjacency Matrix for Immune Repertoire Network

generateAdjacencyMatrix {NAIR}

R Documentation

Compute Graph Adjacency Matrix for Immune Repertoire Network

Description

Given a list of receptor sequences, computes the adjacency matrix for the network graph based on sequence similarity.

sparseAdjacencyMatFromSeqs() is a deprecated equivalent of generateAdjacencyMatrix().

Usage

generateAdjacencyMatrix(
  seqs,
  dist_type = "hamming",
  dist_cutoff = 1,
  drop_isolated_nodes = TRUE,
  method = "default",
  verbose = FALSE
)

# Deprecated equivalent:
sparseAdjacencyMatFromSeqs(
  seqs,
  dist_type = "hamming",
  dist_cutoff = 1,
  drop_isolated_nodes = TRUE,
  method = "default",
  verbose = FALSE,
  max_dist = deprecated()
)

Arguments

`seqs`	A character vector containing the receptor sequences.
`dist_type`	Specifies the function used to quantify the similarity between sequences. The similarity between two sequences determines the pairwise distance between their respective nodes in the network graph, with greater similarity corresponding to shorter distance. Valid options are `"hamming"` (the default), which uses `hamDistBounded`, and `"levenshtein"`, which uses `levDistBounded`.
`dist_cutoff`	A nonnegative scalar. Specifies the maximum pairwise distance (based on `dist_type`) for an edge connection to exist between two nodes. Pairs of nodes whose distance is less than or equal to this value will be joined by an edge connection in the network graph. Controls the stringency of the network construction and affects the number and density of edges in the network. A lower cutoff value requires greater similarity between sequences in order for their respective nodes to be joined by an edge connection. A value of `0` requires two sequences to be identical in order for their nodes to be joined by an edge.
`drop_isolated_nodes`	Logical. When `TRUE`, removes each node that is not joined by an edge connection to any other node in the network graph.
`method`	A character string specifying the algorithm to use. Choices are `"default"` and `"pattern"`. `"pattern"` is only valid when `dist_cutoff < 3`, but tends to be faster than `"default"` for sparsely connected networks, at the cost of greater memory usage (can cause crashes for large or densely-connected networks, particularly for `dist_cutoff = 2`). The default algorithm tends to be faster for densely-connected networks or long sequences.
`verbose`	Logical. If `TRUE`, generates messages about the tasks performed and their progress, as well as relevant properties of intermediate outputs. Messages are sent to `stderr`.
`max_dist`	Equivalent to `dist_cutoff`.

Details

The adjacency matrix of a graph with n nodes is the symmetric n \times n matrix for which entry (i,j) is equal to 1 if nodes i and j are connected by an edge in the network graph and 0 otherwise.

To construct the graph of the immune repertoire network, each receptor sequence is modeled as a node. The similarity between receptor sequences, as measured using either the Hamming or Levenshtein distance, determines the distance between nodes in the network graph. The more similar two sequences are, the shorter the distance between their respective nodes. Two nodes in the graph are joined by an edge if the distance between them is sufficiently small, i.e., if their receptor sequences are sufficiently similar.

Value

A sparse matrix of class dgCMatrix (see dgCMatrix-class).

If drop_isolated_nodes = TRUE, the row and column names of the matrix indicate which receptor sequences in the seqs vector correspond to each row and column of the matrix. The row and column names can be accessed using dimnames. This returns a list containing two character vectors, one for the row names and one for the column names. The name of the ith matrix row is the index of the seqs vector corresponding to the ith row and ith column of the matrix. The name of the jth matrix column is the receptor sequence corresponding to the jth row and jth column of the matrix.

Author(s)

Brian Neal (Brian.Neal@ucsf.edu)

References

Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825

Webpage for the NAIR package

Examples

generateAdjacencyMatrix(
  c("fee", "fie", "foe", "fum", "foo")
)

# No edge connections exist based on a Hamming distance of 1
# (returns a 0x0 sparse matrix)
generateAdjacencyMatrix(
  c("foo", "foobar", "fubar", "bar")
)

# Same as the above example, but keeping all nodes
# (returns a 4x4 sparse matrix)
generateAdjacencyMatrix(
  c("foo", "foobar", "fubar", "bar"),
  drop_isolated_nodes = FALSE
)

# Relaxing the edge criteria using a Hamming distance of 2
# (still results in no edge connections)
generateAdjacencyMatrix(
  c("foo", "foobar", "fubar", "bar"),
  dist_cutoff = 2
)

# Using a Levenshtein distance of 2, however,
# does result in edge connections
generateAdjacencyMatrix(
  c("foo", "foobar", "fubar", "bar"),
  dist_type = "levenshtein",
  dist_cutoff = 2
)

# Using a Hamming distance of 3
# also results in (different) edge connections
generateAdjacencyMatrix(
  c("foo", "foobar", "fubar", "bar"),
  dist_cutoff = 3
)

[Package NAIR version 1.0.4 Index]