R: Build Global Network of Associated TCR/BCR Clusters

buildAssociatedClusterNetwork {NAIR}

R Documentation

Build Global Network of Associated TCR/BCR Clusters

Description

Part of the workflow Searching for Associated TCR/BCR Clusters. Intended for use following findAssociatedClones().

Given data containing a neighborhood of similar clones around each associated sequence, combines the data into a global network and performs network analysis and cluster analysis.

Usage

buildAssociatedClusterNetwork(
  file_list,
  input_type = "rds",
  data_symbols = "data", header = TRUE, sep,
  read.args = list(row.names = 1),
  seq_col,
  min_seq_length = NULL,
  drop_matches = NULL,
  drop_isolated_nodes = FALSE,
  node_stats = TRUE,
  stats_to_include =
    chooseNodeStats(cluster_id = TRUE),
  cluster_stats = TRUE,
  color_nodes_by = "GroupID",
  output_name = "AssociatedClusterNetwork",
  verbose = FALSE,
  ...
)

Arguments

`file_list`	A character vector of file paths, or a list containing `connections` and file paths. Each element corresponds to a single file containing the data for a single sample. Passed to `loadDataFromFileList()`.
`input_type`	A character string specifying the file format of the neighborhood data files. Options are `"table"`, `"txt"`, `"tsv"`, `"csv"`, `"rds"` and `"rda"`. Passed to `loadDataFromFileList()`.
`data_symbols`	Used when `input_type = "rda"`. Specifies the name of each neighborhood's data frame within its respective Rdata file. Passed to `loadDataFromFileList()`.
`header`	For values of `input_type` other than `"rds"` and `"rda"`, this argument is used to specify the value of the `header` argument to `read.table()`, `read.csv()`, etc.
`sep`	For values of `input_type` other than `"rds"` and `"rda"`, this argument can be used to specify a non-default value of the `sep` argument to `read.table()`, `read.csv()`, etc.
`read.args`	For values of `input_type` other than `"rds"` and `"rda"`, this argument is used to specify values of optional arguments to `read.table()`, `read.csv()`, etc. Accepts a named list of argument values. Values of `header` and `sep` in this list take precedence over values specified via the `header` and `sep` arguments.
`seq_col`	Specifies the column of each neighborhood's data frame containing the TCR/BCR sequences. Accepts a character string containing the column name or a numeric scalar containing the column index.
`min_seq_length`	Passed to `buildRepSeqNetwork()` when constructing the global network.
`drop_matches`	Passed to `buildRepSeqNetwork()` when constructing the global network.
`drop_isolated_nodes`	Passed to `buildRepSeqNetwork()` when constructing the global network.
`node_stats`	Passed to `buildRepSeqNetwork()` when constructing the global network.
`stats_to_include`	Passed to `buildRepSeqNetwork()` when constructing the global network.
`cluster_stats`	Passed to `buildRepSeqNetwork()` when constructing the global network.
`color_nodes_by`	Passed to `buildRepSeqNetwork()` when constructing the global network.
`output_name`	Passed to `buildRepSeqNetwork()` when constructing the global network.
`verbose`	Logical. If `TRUE`, generates messages about the tasks performed and their progress, as well as relevant properties of intermediate outputs. Messages are sent to `stderr()`.
`...`	Other arguments to `buildRepSeqNetwork()` when constructing the global network.

Details

Each associated sequence's neighborhood contains clones (from all samples) with TCR/BCR sequences similar to the associated sequence. The neighborhoods are assumed to have been previously identified using findAssociatedClones().

The neighborhood data for all associated sequences are used to construct a single global network. Cluster analysis is used to partition the global network into clusters, which are considered as the associated TCR/BCR clusters. Network properties for the nodes and clusters are computed and returned as metadata. A plot of the global network graph is produced, with the nodes colored according to the binary variable of interest.

See the Searching for Associated TCR/BCR Clusters article on the package website for more details.

Value

A list of network objects as returned by buildRepSeqNetwork(). The list is returned invisibly. If the input data contains a combined total of fewer than two rows, or if the global network contains no nodes, then the function returns NULL, invisibly, with a warning.

Author(s)

Brian Neal (Brian.Neal@ucsf.edu)

References

Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825

Webpage for the NAIR package

Searching for Associated TCR/BCR Clusters article on package website

Examples

set.seed(42)

## Simulate 30 samples from two groups (treatment/control) ##
n_control <- n_treatment <- 15
n_samples <- n_control + n_treatment
sample_size <- 30 # (seqs per sample)
base_seqs <- # first five are associated with treatment
  c("CASSGAYEQYF", "CSVDLGKGNNEQFF", "CASSIEGQLSTDTQYF",
    "CASSEEGQLSTDTQYF", "CASSPEGQLSTDTQYF",
    "RASSLAGNTEAFF", "CASSHRGTDTQYF", "CASDAGVFQPQHF")
# Relative generation probabilities by control/treatment group
pgen_c <- matrix(rep(c(rep(1, 5), rep(30, 3)), times = n_control),
                 nrow = n_control, byrow = TRUE)
pgen_t <- matrix(rep(c(1, 1, rep(1/3, 3), rep(2, 3)), times = n_treatment),
                 nrow = n_treatment, byrow = TRUE)
pgen <- rbind(pgen_c, pgen_t)
simulateToyData(
  samples = n_samples,
  sample_size = sample_size,
  prefix_length = 1,
  prefix_chars = c("", ""),
  prefix_probs = cbind(rep(1, n_samples), rep(0, n_samples)),
  affixes = base_seqs,
  affix_probs = pgen,
  num_edits = 0,
  output_dir = tempdir(),
  no_return = TRUE
)

## Step 1: Find Associated Sequences ##
sample_files <-
  file.path(tempdir(),
            paste0("Sample", 1:n_samples, ".rds")
  )
group_labels <- c(rep("reference", n_control),
                  rep("comparison", n_treatment))
associated_seqs <-
  findAssociatedSeqs(
    file_list = sample_files,
    input_type = "rds",
    group_ids = group_labels,
    seq_col = "CloneSeq",
    min_seq_length = NULL,
    drop_matches = NULL,
    min_sample_membership = 0,
    pval_cutoff = 0.1
  )
head(associated_seqs[, 1:5])

## Step 2: Find Associated Clones ##
dir_step2 <- tempfile()
findAssociatedClones(
  file_list = sample_files,
  input_type = "rds",
  group_ids = group_labels,
  seq_col = "CloneSeq",
  assoc_seqs = associated_seqs$ReceptorSeq,
  min_seq_length = NULL,
  drop_matches = NULL,
  output_dir = dir_step2
)

## Step 3: Global Network of Associated Clusters ##
associated_clusters <-
  buildAssociatedClusterNetwork(
    file_list = list.files(dir_step2,
                           full.names = TRUE
    ),
    seq_col = "CloneSeq",
    size_nodes_by = 1.5,
    print_plots = TRUE
  )

[Package NAIR version 1.0.4 Index]