buildRepSeqNetwork {NAIR} | R Documentation |
Network Analysis of Immune Repertoire
Description
Given Adaptive Immune Receptor Repertoire Sequencing (AIRR-Seq) data, builds the network graph for the immune repertoire based on sequence similarity, computes specified network properties and generates customized visualizations.
buildNet()
is identical to buildRepSeqNetwork()
, existing as
an alias for convenience.
Usage
buildRepSeqNetwork(
## Input ##
data,
seq_col,
count_col = NULL,
subset_cols = NULL,
min_seq_length = 3,
drop_matches = NULL,
## Network ##
dist_type = "hamming",
dist_cutoff = 1,
drop_isolated_nodes = TRUE,
node_stats = FALSE,
stats_to_include = chooseNodeStats(),
cluster_stats = FALSE,
cluster_fun = "fast_greedy",
cluster_id_name = "cluster_id",
## Visualization ##
plots = TRUE,
print_plots = FALSE,
plot_title = "auto",
plot_subtitle = "auto",
color_nodes_by = "auto",
...,
## Output ##
output_dir = NULL,
output_type = "rds",
output_name = "MyRepSeqNetwork",
pdf_width = 12,
pdf_height = 10,
verbose = FALSE
)
# Alias for buildRepSeqNetwork()
buildNet(
data,
seq_col,
count_col = NULL,
subset_cols = NULL,
min_seq_length = 3,
drop_matches = NULL,
dist_type = "hamming",
dist_cutoff = 1,
drop_isolated_nodes = TRUE,
node_stats = FALSE,
stats_to_include = chooseNodeStats(),
cluster_stats = FALSE,
cluster_fun = "fast_greedy",
cluster_id_name = "cluster_id",
plots = TRUE,
print_plots = FALSE,
plot_title = "auto",
plot_subtitle = "auto",
color_nodes_by = "auto",
...,
output_dir = NULL,
output_type = "rds",
output_name = "MyRepSeqNetwork",
pdf_width = 12,
pdf_height = 10,
verbose = FALSE
)
Arguments
data |
A data frame containing the AIRR-Seq data, with variables indexed by column and observations (e.g., clones or cells) indexed by row. |
seq_col |
Specifies the column(s) of |
count_col |
Optional. Specifies the column of |
subset_cols |
Specifies which columns of the AIRR-Seq data are included in the output.
Accepts a vector of column names or a vector of column indices. The default
|
min_seq_length |
A numeric scalar, or |
drop_matches |
Optional. Passed to |
dist_type |
Specifies the function used to quantify the similarity between sequences.
The similarity between two sequences determines the pairwise distance between
their respective nodes in the network graph, with greater similarity corresponding
to shorter distance. Valid options are |
dist_cutoff |
A nonnegative scalar. Specifies the maximum pairwise distance (based on
|
drop_isolated_nodes |
A logical scalar. When |
node_stats |
A logical scalar. Specifies whether node-level network properties are computed. |
stats_to_include |
A named logical vector returned by
|
cluster_stats |
A logical scalar. Specifies whether to compute cluster-level network properties. |
cluster_fun |
Passed to |
cluster_id_name |
Passed to |
plots |
A logical scalar. Specifies whether to generate plots of the network graph. |
print_plots |
A logical scalar. If |
plot_title |
A character string or |
plot_subtitle |
A character string or |
color_nodes_by |
Optional. Specifies a variable to be used as metadata for coloring the nodes
in the network graph plot. Accepts a character string. This can be a column
name of |
... |
Other named arguments to |
output_dir |
A file path specifying the directory for saving the output. The directory will
be created if it does not exist. If |
output_type |
A character string specifying the file format to use when saving the output.
The default value |
output_name |
A character string. All files saved will have file names beginning with this value. |
pdf_width |
Sets the width of each plot when writing to pdf.
Passed to |
pdf_height |
Sets the height of each plot when writing to pdf.
Passed to |
verbose |
Logical. If |
Details
To construct the immune repertoire network, each TCR/BCR clone (bulk data) or cell (single-cell data) is modeled as a node in the network graph, corresponding to a single row of the AIRR-Seq data. For each node, the corresponding receptor sequence is considered. Both nucleotide and amino acid sequences are supported for this purpose. The receptor sequence is used as the basis of similarity and distance between nodes in the network.
Similarity between sequences is measured using either the Hamming distance or Levenshtein (edit) distance. The similarity determines the pairwise distance between nodes in the network graph. The more similar two sequences are, the shorter the distance between their respective nodes. Two nodes in the graph are joined by an edge if the distance between them is sufficiently small, i.e., if their receptor sequences are sufficiently similar.
For single-cell data, edge connections between nodes can be based on similarity
in both the alpha chain and beta chain sequences.
This is done by providing a vector of length 2 to seq_cols
specifying the two sequence columns in data
.
The distance between two nodes is then the greater of the two distances between
sequences in corresponding chains.
Two nodes will be joined by an edge if their alpha chain sequences are sufficiently
similar and their beta chain sequences are sufficiently similar.
See the
buildRepSeqNetwork package vignette
for more details. The vignette can be accessed offline using
vignette("buildRepSeqNetwork")
.
Value
If the constructed network contains no nodes, the function will return
NULL
, invisibly, with a warning. Otherwise, the function invisibly
returns a list containing the following items:
details |
A list containing information about the network and the settings used during its construction. |
igraph |
An object of class |
adjacency_matrix |
The network graph adjacency matrix, stored as a sparse
matrix of class |
node_data |
A data frame containing containing metadata for the network
nodes, where each row corresponds to a node in the network graph. This data
frame contains all variables from |
cluster_data |
A data frame containing network properties for the clusters,
where each row corresponds to a cluster in the network graph. Only included if
|
plots |
A list containing one element for each plot generated
as well as an additional element for the matrix that specifies the graph layout.
Each plot is an object of class |
Author(s)
Brian Neal (Brian.Neal@ucsf.edu)
References
Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825
Examples
set.seed(42)
toy_data <- simulateToyData()
# Simple call
network = buildNet(
toy_data,
seq_col = "CloneSeq",
print_plots = TRUE
)
# Customized:
network <- buildNet(
toy_data, "CloneSeq",
dist_type = "levenshtein",
node_stats = TRUE,
cluster_stats = TRUE,
cluster_fun = "louvain",
cluster_id_name = "cluster_membership",
count_col = "CloneCount",
color_nodes_by = c("SampleID", "cluster_membership", "coreness"),
color_scheme = c("default", "Viridis", "plasma-1"),
size_nodes_by = "degree",
node_size_limits = c(0.1, 1.5),
plot_title = NULL,
plot_subtitle = NULL,
print_plots = TRUE,
verbose = TRUE
)
typeof(network)
names(network)
network$details
head(network$node_data)
head(network$cluster_data)