filter_cv {protti} | R Documentation |
Data filtering based on coefficients of variation (CV)
Description
Filters the input data based on precursor, peptide or protein intensity coefficients of variation. The function should be used to ensure that only robust measurements and quantifications are used for data analysis. It is advised to use the function after inspection of raw values (quality control) and median normalisation. Generally, the function calculates CVs of each peptide, precursor or protein for each condition and removes peptides, precursors or proteins that have a CV above the cutoff in less than the (user-defined) required number of conditions. Since the user-defined cutoff is fixed and does not depend on the number of conditions that have detected values, the function might bias for data completeness.
Usage
filter_cv(
data,
grouping,
condition,
log2_intensity,
cv_limit = 0.25,
min_conditions,
silent = FALSE
)
Arguments
data |
a data frame that contains at least the input variables. |
grouping |
a character column in the |
condition |
a character or numeric column in the |
log2_intensity |
a numeric column in the |
cv_limit |
optional, a numeric value that specifies the CV cutoff that will be applied. Default is 0.25. |
min_conditions |
a numeric value that specifies the minimum number of conditions for which grouping CVs should be below the cutoff. |
silent |
a logical value that specifies if a message with the number of filtered out conditions should be returned. Default is FALSE. |
Value
The CV filtered data frame.
Examples
set.seed(123) # Makes example reproducible
# Create synthetic data
data <- create_synthetic_data(
n_proteins = 50,
frac_change = 0.05,
n_replicates = 3,
n_conditions = 2,
method = "effect_random",
additional_metadata = FALSE
)
# Filter coefficients of variation
data_filtered <- filter_cv(
data = data,
grouping = peptide,
condition = condition,
log2_intensity = peptide_intensity_missing,
cv_limit = 0.25,
min_conditions = 2
)