calculate_diff_abundance {protti} | R Documentation |
Calculate differential abundance between conditions
Description
Performs differential abundance calculations and statistical hypothesis tests on data frames with protein, peptide or precursor data. Different methods for statistical testing are available.
Usage
calculate_diff_abundance(
data,
sample,
condition,
grouping,
intensity_log2,
missingness = missingness,
comparison = comparison,
mean = NULL,
sd = NULL,
n_samples = NULL,
ref_condition = "all",
filter_NA_missingness = TRUE,
method = c("moderated_t-test", "t-test", "t-test_mean_sd", "proDA"),
p_adj_method = "BH",
retain_columns = NULL
)
Arguments
data |
a data frame containing at least the input variables that are required for the
selected method. Ideally the output of |
sample |
a character column in the |
condition |
a character or numeric column in the |
grouping |
a character column in the |
intensity_log2 |
a numeric column in the |
missingness |
a character column in the |
comparison |
a character column in the |
mean |
a numeric column in the |
sd |
a numeric column in the |
n_samples |
a numeric column in the |
ref_condition |
optional, character value providing the condition that is used as a
reference for differential abundance calculation. Only required for |
filter_NA_missingness |
a logical value, default is |
method |
a character value, specifies the method used for statistical hypothesis testing.
Methods include Welch test ( |
p_adj_method |
a character value, specifies the p-value correction method. Possible
methods are c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"). Default
method is |
retain_columns |
a vector indicating if certain columns should be retained from the input
data frame. Default is not retaining additional columns |
Value
A data frame that contains differential abundances (diff
), p-values (pval
)
and adjusted p-values (adj_pval
) for each protein, peptide or precursor (depending on
the grouping
variable) and the associated treatment/reference pair. Depending on the
method the data frame contains additional columns:
"t-test": The
std_error
column contains the standard error of the differential abundances.n_obs
contains the number of observations for the specific protein, peptide or precursor (depending on thegrouping
variable) and the associated treatment/reference pair."t-test_mean_sd": Columns labeled as control refer to the second condition of the comparison pairs. Treated refers to the first condition.
mean_control
andmean_treated
columns contain the means for the reference and treatment condition, respectively.sd_control
andsd_treated
columns contain the standard deviations for the reference and treatment condition, respectively.n_control
andn_treated
columns contain the numbers of samples for the reference and treatment condition, respectively. Thestd_error
column contains the standard error of the differential abundances.t_statistic
contains the t_statistic for the t-test."moderated_t-test":
CI_2.5
andCI_97.5
contain the 2.5% and 97.5% confidence interval borders for differential abundances.avg_abundance
contains average abundances for treatment/reference pairs (mean of the two group means).t_statistic
contains the t_statistic for the t-test.B
The B-statistic is the log-odds that the protein, peptide or precursor (depending ongrouping
) has a differential abundance between the two groups. Suppose B=1.5. The odds of differential abundance is exp(1.5)=4.48, i.e, about four and a half to one. The probability that there is a differential abundance is 4.48/(1+4.48)=0.82, i.e., the probability is about 82% that this group is differentially abundant. A B-statistic of zero corresponds to a 50-50 chance that the group is differentially abundant.n_obs
contains the number of observations for the specific protein, peptide or precursor (depending on thegrouping
variable) and the associated treatment/reference pair."proDA": The
std_error
column contains the standard error of the differential abundances.avg_abundance
contains average abundances for treatment/reference pairs (mean of the two group means).t_statistic
contains the t_statistic for the t-test.n_obs
contains the number of observations for the specific protein, peptide or precursor (depending on thegrouping
variable) and the associated treatment/reference pair.
For all methods execept "proDA"
, the p-value adjustment is performed only on the
proportion of data that contains a p-value that is not NA
. For "proDA"
the
p-value adjustment is either performed on the complete dataset (filter_NA_missingness = TRUE
)
or on the subset of the dataset with missingness that is not NA
(filter_NA_missingness = FALSE
).
Examples
set.seed(123) # Makes example reproducible
# Create synthetic data
data <- create_synthetic_data(
n_proteins = 10,
frac_change = 0.5,
n_replicates = 4,
n_conditions = 2,
method = "effect_random",
additional_metadata = FALSE
)
# Assign missingness information
data_missing <- assign_missingness(
data,
sample = sample,
condition = condition,
grouping = peptide,
intensity = peptide_intensity_missing,
ref_condition = "all",
retain_columns = c(protein, change_peptide)
)
# Calculate differential abundances
# Using "moderated_t-test" and "proDA" improves
# true positive recovery progressively
diff <- calculate_diff_abundance(
data = data_missing,
sample = sample,
condition = condition,
grouping = peptide,
intensity_log2 = peptide_intensity_missing,
missingness = missingness,
comparison = comparison,
method = "t-test",
retain_columns = c(protein, change_peptide)
)
head(diff, n = 10)