aggregateIdenticalClones {NAIR} | R Documentation |
Aggregate Counts/Frequencies for Clones With Identical Receptor Sequences
Description
Given bulk Adaptive Immune Receptor Repertoire Sequencing (AIRR-Seq) data with clones indexed by row, returns a data frame containing one row for each unique receptor sequence. Includes the number of clones sharing each sequence, as well as aggregate values for clone count and clone frequency across all clones sharing each sequence. Clones can be grouped according to metadata, in which case aggregation is performed within (but not across) groups.
Usage
aggregateIdenticalClones(
data,
clone_col,
count_col,
freq_col,
grouping_cols = NULL,
verbose = FALSE
)
Arguments
data |
A data frame containing the bulk AIRR-Seq data, with clones indexed by row. |
clone_col |
Specifies the column of |
count_col |
Specifies the column of |
freq_col |
Specifies the column of |
grouping_cols |
An optional character vector of column names
or numeric vector of column indices, specifying
one or more columns of |
verbose |
Logical. If |
Details
If grouping_cols
is left unspecified, the returned data frame will contain
one row for each unique receptor sequence appearing in data
.
If one or more columns of data
are specified using the grouping_cols
argument, then each clone (row) in data
is assigned to a group based on its
combination of values in these columns. If two clones share the same receptor sequence
but belong to different groups, their receptor sequence will appear multiple times
in the returned data frame, with one row for each group in which the sequence appears.
In each such row, the aggregate clone count, aggregate clone frequency, and number of
clones sharing the sequence are reported within the group for that row.
Value
A data frame whose first column contains the receptor sequences and has the
same name as the column of data
specified by clone_col
. One
additional column will be present for each column of data
that is
specified using the grouping_cols
argument, with each having the same
column name. The remaining columns are as follows:
AggregatedCloneCount |
The aggregate clone count across all clones (within the same group, if applicable) that share the receptor sequence in that row. |
AggregatedCloneFrequency |
The aggregate clone frequency across all clones (within the same group, if applicable) that share the receptor sequence in that row. |
UniqueCloneCount |
The number of clones (rows) in |
Author(s)
Brian Neal (Brian.Neal@ucsf.edu)
References
Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825
Examples
my_data <- data.frame(
clone_seq = c("ATCG", rep("ACAC", 2), rep("GGGG", 4)),
clone_count = rep(1, 7),
clone_freq = rep(1/7, 7),
time_point = c("t_0", rep(c("t_0", "t_1"), 3)),
subject_id = c(rep(1, 5), rep(2, 2))
)
my_data
aggregateIdenticalClones(
my_data,
"clone_seq",
"clone_count",
"clone_freq",
)
# group clones by time point
aggregateIdenticalClones(
my_data,
"clone_seq",
"clone_count",
"clone_freq",
grouping_cols = "time_point"
)
# group clones by subject ID
aggregateIdenticalClones(
my_data,
"clone_seq",
"clone_count",
"clone_freq",
grouping_cols = "subject_id"
)
# group clones by time point and subject ID
aggregateIdenticalClones(
my_data,
"clone_seq",
"clone_count",
"clone_freq",
grouping_cols =
c("subject_id", "time_point")
)