scale_features_rlm {SIPmg} | R Documentation |
Scale feature coverage values to estimate their absolute abundance
Description
Calculates global scaling factors for features (contigs or bins),based on linear regression of sequin coverage. Options include log-transformations of coverage, as well as filtering features based on limit of detection. This function must be called first, before the feature abundance table, feature detection table, and plots are retrieved.
Usage
scale_features_rlm(
f_tibble,
sequin_meta,
seq_dilution,
log_trans = TRUE,
coe_of_variation = 250,
lod_limit = 0,
save_plots = TRUE,
plot_dir = tempdir()
)
Arguments
f_tibble |
Can be either of (1) a tibble with first column "Feature" that contains bin IDs, and the rest of the columns represent samples with bins' coverage values. (2) a tibble as outputted by the program "checkm coverage" from the tool CheckM. Please check CheckM documentation - https://github.com/Ecogenomics/CheckM on the usage for "checkm coverage" program |
sequin_meta |
tibble containing sequin names ("Feature column") and concentrations in attamoles/uL ("Concentration") column. |
seq_dilution |
tibble with first column "Sample" with same sample names as in f_tibble, and a second column "Dilution" showing ratio of sequins added to final sample volume (e.g. a value of 0.01 for a dilution of 1 volume sequin to 99 volumes sample) |
log_trans |
Boolean (TRUE or FALSE), should coverages and sequin concentrations be log-scaled? Default = TRUE |
coe_of_variation |
Acceptable coefficient of variation for coverage and detection (eg. 20 - for 20 % threshold of coefficient of variation). Coverages above the threshold value will be flagged in the plots. Default = 250 |
lod_limit |
(Decimal range 0-1) Threshold for the percentage of minimum detected sequins per concentration group. Default = 0 |
save_plots |
Boolean (TRUE or FALSE), should sequin scaling be saved? Default = TRUE |
plot_dir |
Directory where plots are to be saved. Will create a directory "sequin_scaling_plots_rlm" if it does not exist. |
Value
a list of tibbles containing
mag_tab: a tibble with first column "Feature" that contains bin (or contig IDs), and the rest of the columns represent samples with features' scaled abundances (attamoles/uL)
mag_det: a tibble with first column "Feature" that contains bin (or contig IDs),
plots: linear regression plots for scaling MAG coverage values to absolute abundance (optional)
scale_fac: a master tibble with all of the intermediate values in above calculations
Examples
data(f_tibble, sequins, seq_dil)
### scaling sequins from coverage values
scaled_features_rlm = scale_features_rlm(f_tibble,sequins, seq_dil)