subsampleDb {tigger} | R Documentation |
Subsample repertoire
Description
subsampleDb
will sample the same number of sequences for each gene, family
or allele (specified with mode
) in data
. Samples or subjects can
be subsampled indepently by setting group
.
Usage
subsampleDb(
data,
gene = "v_call",
mode = c("gene", "allele", "family"),
min_n = 1,
max_n = NULL,
group = NULL
)
Arguments
data |
|
gene |
name of the column in |
mode |
one of |
min_n |
minimum number of observations to sample from each groupe. A group with less observations than the minimum is excluded. |
max_n |
maximum number of observations to sample for all |
group |
columns containing additional grouping variables, e.g. sample_id.
These groups will be subsampled independently. If
|
Details
data
will be split into gene, allele or family subsets (mode
) from
which the same number of sequences will be subsampled. If mode=gene
,
for each gene in the field gene
from data
, a maximum of
max_n
sequences will be subsampled. Input sequences
that have multiple gene calls (ties), can be subsampled from any of their calls,
but these duplicated samplings will be removed, and the final
subsampled data
will contain unique rows.
Value
Subsampled version of the input data
.
See Also
Examples
subsampleDb(AIRRDb)