correct_lip_for_abundance {protti}R Documentation

Protein abundance correction for LiP-data

Description

Performs the correction of LiP-peptides for changes in protein abundance and calculates their significance using a t-test. This function was implemented based on the MSstatsLiP package developed by the Vitek lab.

Usage

correct_lip_for_abundance(
  lip_data,
  trp_data,
  protein_id,
  grouping,
  comparison = comparison,
  diff = diff,
  n_obs = n_obs,
  std_error = std_error,
  p_adj_method = "BH",
  retain_columns = NULL,
  method = c("satterthwaite", "no_df_approximation")
)

Arguments

lip_data

a data frame containing at least the input variables. Ideally, the result from the calculate_diff_abundance function is used.

trp_data

a data frame containing at least the input variables minus the grouping column. Ideally, the result from the calculate_diff_abundance function is used.

protein_id

a character column in the lip_data and trp_data data frames that contains protein identifiers.

grouping

a character column in the lip_data data frame that contains precursor or peptide identifiers.

comparison

a character column in the lip_data and trp_data data frames that contains the comparisons between conditions.

diff

a numeric column in the lip_data and trp_data data frames that contains log2-fold changes for peptide or protein quantities.

n_obs

a numeric column in the lip_data and trp_data data frames containing the number of observations used to calculate fold changes.

std_error

a numeric column in the lip_data and trp_data data frames containing the standard error of fold changes.

p_adj_method

a character value, specifies the p-value correction method. Possible methods are c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"). Default method is "BH".

retain_columns

a vector indicating if certain columns should be retained from the input data frame. Default is not retaining additional columns retain_columns = NULL. Specific columns can be retained by providing their names (not in quotations marks, just like other column names, but in a vector). Please note that if you retain columns that have multiple rows per grouped variable there will be duplicated rows in the output.

method

a character value, specifies the method used to estimate the degrees of freedom. Possible methods are c("satterthwaite", "no_df_approximation"). satterthwaite uses the Welch-Satterthwaite equation to estimate the pooled degrees of freedom, as described in https://doi.org/10.1016/j.mcpro.2022.100477 and implemented in the MSstatsLiP package. This approach respects the number of protein measurements for the degrees of freedom. no_df_approximation just takes the number of peptides into account when calculating the degrees of freedom.

Value

a data frame containing corrected differential abundances (adj_diff, adjusted standard errors (adj_std_error), degrees of freedom (df), pvalues (pval) and adjusted p-values (adj_pval)

Author(s)

Aaron Fehr

Examples


# Load libraries

library(dplyr)

# Load example data and simulate tryptic data by summing up precursors

data <- rapamycin_10uM

data_trp <- data %>%
  dplyr::group_by(pg_protein_accessions, r_file_name) %>%
  dplyr::mutate(pg_quantity = sum(fg_quantity)) %>%
  dplyr::distinct(
    r_condition,
    r_file_name,
    pg_protein_accessions,
    pg_quantity
  )


# Calculate differential abundances for LiP and Trp data

diff_lip <- data %>%
  dplyr::mutate(fg_intensity_log2 = log2(fg_quantity)) %>%
  assign_missingness(
    sample = r_file_name,
    condition = r_condition,
    intensity = fg_intensity_log2,
    grouping = eg_precursor_id,
    ref_condition = "control",
    retain_columns = "pg_protein_accessions"
  ) %>%
  calculate_diff_abundance(
    sample = r_file_name,
    condition = r_condition,
    grouping = eg_precursor_id,
    intensity_log2 = fg_intensity_log2,
    comparison = comparison,
    method = "t-test",
    retain_columns = "pg_protein_accessions"
  )


diff_trp <- data_trp %>%
  dplyr::mutate(pg_intensity_log2 = log2(pg_quantity)) %>%
  assign_missingness(
    sample = r_file_name,
    condition = r_condition,
    intensity = pg_intensity_log2,
    grouping = pg_protein_accessions,
    ref_condition = "control"
  ) %>%
  calculate_diff_abundance(
    sample = r_file_name,
    condition = r_condition,
    grouping = pg_protein_accessions,
    intensity_log2 = pg_intensity_log2,
    comparison = comparison,
    method = "t-test"
  )

# Correct for abundance changes

corrected <- correct_lip_for_abundance(
  lip_data = diff_lip,
  trp_data = diff_trp,
  protein_id = pg_protein_accessions,
  grouping = eg_precursor_id,
  retain_columns = c("missingness"),
  method = "satterthwaite"
)

head(corrected, n = 10)

[Package protti version 0.8.0 Index]