generateAdjacencyMatrix {NAIR} | R Documentation |
Compute Graph Adjacency Matrix for Immune Repertoire Network
Description
Given a list of receptor sequences, computes the adjacency matrix for the network graph based on sequence similarity.
sparseAdjacencyMatFromSeqs()
is a deprecated equivalent of
generateAdjacencyMatrix()
.
Usage
generateAdjacencyMatrix(
seqs,
dist_type = "hamming",
dist_cutoff = 1,
drop_isolated_nodes = TRUE,
method = "default",
verbose = FALSE
)
# Deprecated equivalent:
sparseAdjacencyMatFromSeqs(
seqs,
dist_type = "hamming",
dist_cutoff = 1,
drop_isolated_nodes = TRUE,
method = "default",
verbose = FALSE,
max_dist = deprecated()
)
Arguments
seqs |
A character vector containing the receptor sequences. |
dist_type |
Specifies the function used to quantify the similarity between sequences. The
similarity between two sequences determines the pairwise distance between their
respective nodes in the network graph, with greater similarity corresponding to
shorter distance. Valid options are |
dist_cutoff |
A nonnegative scalar. Specifies the maximum pairwise distance (based on
|
drop_isolated_nodes |
Logical. When |
method |
A character string specifying the algorithm to use. Choices are |
verbose |
Logical. If |
max_dist |
Details
The adjacency matrix of a graph with n
nodes is the symmetric
n \times n
matrix for which entry (i,j)
is equal to 1 if nodes
i
and j
are connected by an edge in the network graph and 0 otherwise.
To construct the graph of the immune repertoire network, each receptor sequence is modeled as a node. The similarity between receptor sequences, as measured using either the Hamming or Levenshtein distance, determines the distance between nodes in the network graph. The more similar two sequences are, the shorter the distance between their respective nodes. Two nodes in the graph are joined by an edge if the distance between them is sufficiently small, i.e., if their receptor sequences are sufficiently similar.
Value
A sparse matrix of class dgCMatrix
(see dgCMatrix-class
).
If drop_isolated_nodes = TRUE
, the row and column names of the matrix
indicate which receptor sequences in the seqs
vector correspond to each
row and column of the matrix. The row and column names can be accessed using
dimnames
. This returns a list containing two character vectors,
one for the row names and one for the column names. The name of the i
th
matrix row is the index of the seqs
vector corresponding to the i
th
row and i
th column of the matrix. The name of the j
th matrix column
is the receptor sequence corresponding to the j
th row and j
th column
of the matrix.
Author(s)
Brian Neal (Brian.Neal@ucsf.edu)
References
Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825
Examples
generateAdjacencyMatrix(
c("fee", "fie", "foe", "fum", "foo")
)
# No edge connections exist based on a Hamming distance of 1
# (returns a 0x0 sparse matrix)
generateAdjacencyMatrix(
c("foo", "foobar", "fubar", "bar")
)
# Same as the above example, but keeping all nodes
# (returns a 4x4 sparse matrix)
generateAdjacencyMatrix(
c("foo", "foobar", "fubar", "bar"),
drop_isolated_nodes = FALSE
)
# Relaxing the edge criteria using a Hamming distance of 2
# (still results in no edge connections)
generateAdjacencyMatrix(
c("foo", "foobar", "fubar", "bar"),
dist_cutoff = 2
)
# Using a Levenshtein distance of 2, however,
# does result in edge connections
generateAdjacencyMatrix(
c("foo", "foobar", "fubar", "bar"),
dist_type = "levenshtein",
dist_cutoff = 2
)
# Using a Hamming distance of 3
# also results in (different) edge connections
generateAdjacencyMatrix(
c("foo", "foobar", "fubar", "bar"),
dist_cutoff = 3
)