R: Scale feature coverage values to estimate their absolute...

scale_features_lm {SIPmg}

R Documentation

Scale feature coverage values to estimate their absolute abundance

Description

Calculates global scaling factors for features (contigs or bins),based on linear regression of sequin coverage. Options include log-transformations of coverage, as well as filtering features based on limit of detection. This function must be called first, before the feature abundance table, feature detection table, and plots are retrieved.

Usage

scale_features_lm(
  f_tibble,
  sequin_meta,
  seq_dilution,
  log_trans = TRUE,
  coe_of_variation = 250,
  lod_limit = 0,
  save_plots = TRUE,
  plot_dir = tempdir(),
  cook_filtering = TRUE
)

Arguments

`f_tibble`	Can be either of (1) a tibble with first column "Feature" that contains bin IDs, and the rest of the columns represent samples with bins' coverage values. (2) a tibble as outputted by the program "checkm coverage" from the tool CheckM. Please check CheckM documentation - https://github.com/Ecogenomics/CheckM on the usage for "checkm coverage" program
`sequin_meta`	tibble containing sequin names ("Feature column") and concentrations in attamoles/uL ("Concentration") column.
`seq_dilution`	tibble with first column "Sample" with same sample names as in f_tibble, and a second column "Dilution" showing ratio of sequins added to final sample volume (e.g. a value of 0.01 for a dilution of 1 volume sequin to 99 volumes sample)
`log_trans`	Boolean (TRUE or FALSE), should coverages and sequin concentrations be log-scaled?
`coe_of_variation`	Acceptable coefficient of variation for coverage and detection (eg. 20 - for 20 % threshold of coefficient of variation). Coverages above the threshold value will be flagged in the plots.
`lod_limit`	(Decimal range 0-1) Threshold for the percentage of minimum detected sequins per concentration group. Default = 0
`save_plots`	Boolean (TRUE or FALSE), should sequin scaling be saved? Default = TRUE
`plot_dir`	Directory where plots are to be saved. Will create a directory "sequin_scaling_plots_lm" if it does not exist.
`cook_filtering`	Boolean (TRUE or FALSE), should data points be filtered based on Cook's distance metric. Cooks distance can be useful in detecting influential outliers in an ordinary least square’s regression model, which can negatively influence the model. A threshold of Cooks distance of 4/n (where n is the sample size) is chosen, and any data point with Cooks distance > 4/n is filtered out. It is typical to choose 4/n as the threshold in detecting the outliers in the data. Default = TRUE

Value

a list of tibbles containing

mag_tab: a tibble with first column "Feature" that contains bin (or contig IDs), and the rest of the columns represent samples with features' scaled abundances (attamoles/uL)
mag_det: a tibble with first column "Feature" that contains bin (or contig IDs),
plots: linear regression plots for scaling MAG coverage values to absolute abundance
scale_fac: a master tibble with all of the intermediate values in above calculations

Examples

data(f_tibble, sequins, seq_dil)



### scaling sequins from coverage values
scaled_features_lm = scale_features_lm(f_tibble,sequin_meta, seq_dil)

[Package SIPmg version 1.4.1 Index]