R: Select contigs

subsetContigs {SQMtools}

R Documentation

Select contigs

Description

Create a SQM object containing only the requested contigs, the ORFs contained in them and the bins that contain them.

Usage

subsetContigs(
  SQM,
  contigs,
  trusted_functions_only = FALSE,
  ignore_unclassified_functions = FALSE,
  rescale_tpm = FALSE,
  rescale_copy_number = FALSE
)

Arguments

`SQM`	SQM object to be subsetted.
`contigs`	character. Vector of contigs to be selected.
`trusted_functions_only`	logical. If `TRUE`, only highly trusted functional annotations (best hit + best average) will be considered when generating aggregated function tables. If `FALSE`, best hit annotations will be used (default `FALSE`).
`ignore_unclassified_functions`	logical. If `FALSE`, ORFs with no functional classification will be aggregated together into an "Unclassified" category. If `TRUE`, they will be ignored (default `FALSE`).
`rescale_tpm`	logical. If `TRUE`, TPMs for KEGGs, COGs, and PFAMs will be recalculated (so that the TPMs in the subset actually add up to 1 million). Otherwise, per-function TPMs will be calculated by aggregating the TPMs of the ORFs annotated with that function, and will thus keep the scaling present in the parent object (default `FALSE`).
`rescale_copy_number`	logical. If `TRUE`, copy numbers with be recalculated using the RecA/RadA coverages in the subset. Otherwise, RecA/RadA coverages will be taken from the parent object. By default it is set to `FALSE`, which means that the returned copy numbers for each function will represent the average copy number of that function per genome in the parent object.

Value

SQM object containing only the selected contigs.

Examples

data(Hadza)
# Which contigs have a GC content below 40?
lowGCcontigNames = rownames(Hadza$contigs$table[Hadza$contigs$table[,"GC perc"]<40,])
lowGCcontigs = subsetContigs(Hadza, lowGCcontigNames)
hist(lowGCcontigs$contigs$table[,"GC perc"])