R: Protein abundance correction for LiP-data

correct_lip_for_abundance {protti}

R Documentation

Protein abundance correction for LiP-data

Description

Performs the correction of LiP-peptides for changes in protein abundance and calculates their significance using a t-test. This function was implemented based on the MSstatsLiP package developed by the Vitek lab.

Usage

correct_lip_for_abundance(
  lip_data,
  trp_data,
  protein_id,
  grouping,
  comparison = comparison,
  diff = diff,
  n_obs = n_obs,
  std_error = std_error,
  p_adj_method = "BH",
  retain_columns = NULL,
  method = c("satterthwaite", "no_df_approximation")
)

Arguments

`lip_data`	a data frame containing at least the input variables. Ideally, the result from the `calculate_diff_abundance` function is used.
`trp_data`	a data frame containing at least the input variables minus the grouping column. Ideally, the result from the `calculate_diff_abundance` function is used.
`protein_id`	a character column in the `lip_data` and `trp_data` data frames that contains protein identifiers.
`grouping`	a character column in the `lip_data` data frame that contains precursor or peptide identifiers.
`comparison`	a character column in the `lip_data` and `trp_data` data frames that contains the comparisons between conditions.
`diff`	a numeric column in the `lip_data` and `trp_data` data frames that contains log2-fold changes for peptide or protein quantities.
`n_obs`	a numeric column in the `lip_data` and `trp_data` data frames containing the number of observations used to calculate fold changes.
`std_error`	a numeric column in the `lip_data` and `trp_data` data frames containing the standard error of fold changes.
`p_adj_method`	a character value, specifies the p-value correction method. Possible methods are c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"). Default method is `"BH"`.
`retain_columns`	a vector indicating if certain columns should be retained from the input data frame. Default is not retaining additional columns `retain_columns = NULL`. Specific columns can be retained by providing their names (not in quotations marks, just like other column names, but in a vector). Please note that if you retain columns that have multiple rows per grouped variable there will be duplicated rows in the output.
`method`	a character value, specifies the method used to estimate the degrees of freedom. Possible methods are c("satterthwaite", "no_df_approximation"). `satterthwaite` uses the Welch-Satterthwaite equation to estimate the pooled degrees of freedom, as described in https://doi.org/10.1016/j.mcpro.2022.100477 and implemented in the MSstatsLiP package. This approach respects the number of protein measurements for the degrees of freedom. `no_df_approximation` just takes the number of peptides into account when calculating the degrees of freedom.

Value

a data frame containing corrected differential abundances (adj_diff, adjusted standard errors (adj_std_error), degrees of freedom (df), pvalues (pval) and adjusted p-values (adj_pval)

Author(s)

Aaron Fehr

Examples


# Load libraries

library(dplyr)

# Load example data and simulate tryptic data by summing up precursors

data <- rapamycin_10uM

data_trp <- data %>%
  dplyr::group_by(pg_protein_accessions, r_file_name) %>%
  dplyr::mutate(pg_quantity = sum(fg_quantity)) %>%
  dplyr::distinct(
    r_condition,
    r_file_name,
    pg_protein_accessions,
    pg_quantity
  )


# Calculate differential abundances for LiP and Trp data

diff_lip <- data %>%
  dplyr::mutate(fg_intensity_log2 = log2(fg_quantity)) %>%
  assign_missingness(
    sample = r_file_name,
    condition = r_condition,
    intensity = fg_intensity_log2,
    grouping = eg_precursor_id,
    ref_condition = "control",
    retain_columns = "pg_protein_accessions"
  ) %>%
  calculate_diff_abundance(
    sample = r_file_name,
    condition = r_condition,
    grouping = eg_precursor_id,
    intensity_log2 = fg_intensity_log2,
    comparison = comparison,
    method = "t-test",
    retain_columns = "pg_protein_accessions"
  )


diff_trp <- data_trp %>%
  dplyr::mutate(pg_intensity_log2 = log2(pg_quantity)) %>%
  assign_missingness(
    sample = r_file_name,
    condition = r_condition,
    intensity = pg_intensity_log2,
    grouping = pg_protein_accessions,
    ref_condition = "control"
  ) %>%
  calculate_diff_abundance(
    sample = r_file_name,
    condition = r_condition,
    grouping = pg_protein_accessions,
    intensity_log2 = pg_intensity_log2,
    comparison = comparison,
    method = "t-test"
  )

# Correct for abundance changes

corrected <- correct_lip_for_abundance(
  lip_data = diff_lip,
  trp_data = diff_trp,
  protein_id = pg_protein_accessions,
  grouping = eg_precursor_id,
  retain_columns = c("missingness"),
  method = "satterthwaite"
)

head(corrected, n = 10)

[Package protti version 0.9.0 Index]