olink_normalization {OlinkAnalyze} | R Documentation |
Normalization of all proteins (by OlinkID).
Description
Normalizes NPX data frames to another data frame or to reference medians. If two dataframes are normalized to one another, Olink's default is using the older dataframe as reference.
The function handles three different types of normalization:
Bridging normalization: One of the dataframes is adjusted to another using overlapping samples (bridge samples).
The overlapping samples need to be named the same between the dataframes and adjustment is made using the median of the paired differences between the bridge samples in the two data frames.
The two dataframes are inputs df1 and df2, the one being adjusted to is specified in the input reference_project and the overlapping samples are specified in overlapping_samples_df1.
Only overlapping_samples_df1 should be input, no matter which dataframe is used as reference_project.
Subset normalization: One of the dataframes is adjusted to another dataframe
using a sample subset. Adjustment is made using the differences in median
between the subsets from the two dataframes. Both overlapping_samples_df1 and
overlapping_samples_df2 need to be input. The samples do not need to be
named the same.
A special case of subset normalization are to use all samples (except control
samples and samples with QC warning) from df1 as input in overlapping_samples_df1
and all samples from df2 as input in overlapping_samples_df2.
Reference median normalization: Working only on one dataframe. This is effectively subset normalization, but using difference of medians to pre-recorded median values.
df1, overlapping_samples_df1 and reference_medians need to be specified. Adjustment of df1 is made using the differences in median between the overlapping samples and the reference medians.
Usage
olink_normalization(
df1,
df2 = NULL,
overlapping_samples_df1,
overlapping_samples_df2 = NULL,
df1_project_nr = "P1",
df2_project_nr = "P2",
reference_project = "P1",
reference_medians = NULL
)
Arguments
df1 |
First dataframe to be used in normalization (required). |
df2 |
Second dataframe to be used in normalization |
overlapping_samples_df1 |
Samples to be used for adjustment factor calculation in df1 (required). |
overlapping_samples_df2 |
Samples to be used for adjustment factor calculation in df1. |
df1_project_nr |
Project name of first dataset. |
df2_project_nr |
Project name of second dataset. |
reference_project |
Project name of reference_project. Needs to be the same as either df1_project_nr or df2_project_nr. The project to which the second project is adjusted to. |
reference_medians |
Dataframe which needs to contain columns "OlinkID", and "Reference_NPX". Used for reference median normalization. |
Value
A "tibble" of NPX data in long format containing normalized NPX values, including adjustment factors. Columns include same as df1/df2 with additional column Adj_factor which includes the adjustment factor in the normalization.
Examples
library(dplyr)
npx_df1 <- npx_data1 %>% dplyr::mutate(Project = 'P1')
npx_df2 <- npx_data2 %>% dplyr::mutate(Project = 'P2')
#Bridging normalization:
# Find overlapping samples, but exclude Olink control
overlap_samples <- intersect((npx_df1 %>%
dplyr::filter(!grepl("control", SampleID,
ignore.case=TRUE)))$SampleID,
(npx_df2 %>%
dplyr::filter(!grepl("control", SampleID,
ignore.case=TRUE)))$SampleID)
# Normalize
olink_normalization(df1 = npx_df1,
df2 = npx_df2,
overlapping_samples_df1 = overlap_samples,
df1_project_nr = 'P1',
df2_project_nr = 'P2',
reference_project = 'P1')
#Subset normalization:
# Find a suitable subset of samples from both projects, but exclude Olink controls
# and samples which do not pass QC.
df1_sampleIDs <- npx_df1 %>%
dplyr::group_by(SampleID) %>%
dplyr::filter(all(QC_Warning == 'Pass')) %>%
dplyr::filter(!stringr::str_detect(SampleID, 'CONTROL_SAMPLE')) %>%
dplyr::select(SampleID) %>%
unique() %>%
dplyr::pull(SampleID)
df2_sampleIDs <- npx_df2 %>%
dplyr::group_by(SampleID) %>%
dplyr::filter(all(QC_Warning == 'Pass')) %>%
dplyr::filter(!stringr::str_detect(SampleID, 'CONTROL_SAMPLE')) %>%
dplyr::select(SampleID) %>%
unique() %>%
dplyr::pull(SampleID)
some_samples_df1 <- sample(df1_sampleIDs, 16)
some_samples_df2 <- sample(df2_sampleIDs, 16)
olink_normalization(df1 = npx_df1,
df2 = npx_df2,
overlapping_samples_df1 = some_samples_df1,
overlapping_samples_df2 = some_samples_df2)
## Special case of subset normalization when using all samples.
olink_normalization(df1 = npx_df1,
df2 = npx_df2,
overlapping_samples_df1 = df1_sampleIDs,
overlapping_samples_df2 = df2_sampleIDs)
#Reference median normalization:
# For the sake of this example, set the reference median to 1
ref_median_df <- npx_df1 %>%
dplyr::select(OlinkID) %>%
dplyr::distinct() %>%
dplyr::mutate(Reference_NPX = 1)
# Normalize
olink_normalization(df1 = npx_df1,
overlapping_samples_df1 = some_samples_df1,
reference_medians = ref_median_df)