filterInputData {NAIR} | R Documentation |
Filter Data Rows and Subset Data Columns
Description
Given a data frame with a column containing receptor sequences, filter data rows by sequence length and sequence content. Keep all data columns or choose which columns to keep.
Usage
filterInputData(
data,
seq_col,
min_seq_length = NULL,
drop_matches = NULL,
subset_cols = NULL,
count_col = deprecated(),
verbose = FALSE
)
Arguments
data |
A data frame. |
seq_col |
Specifies the column(s) of |
min_seq_length |
Observations whose receptor sequences have fewer than |
drop_matches |
Accepts a character string containing a regular expression
(see |
subset_cols |
Specifies which columns of the AIRR-Seq data are included in the output.
Accepts a character vector of column names
or a numeric vector of column indices.
The default
|
count_col |
|
verbose |
Logical. If |
Value
A data frame.
Author(s)
Brian Neal (Brian.Neal@ucsf.edu)
References
Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825
Examples
set.seed(42)
raw_data <- simulateToyData()
# Remove sequences shorter than 13 characters,
# as well as sequences containing the subsequence "GGGG".
# Keep variables for clone sequence, clone frequency and sample ID
filterInputData(
raw_data,
seq_col = "CloneSeq",
min_seq_length = 13,
drop_matches = "GGGG",
subset_cols =
c("CloneSeq", "CloneFrequency", "SampleID"),
verbose = TRUE
)