pairedStat {NewmanOmics} | R Documentation |
Paired Newman Statistic
Description
The Paired Newman Statistic is used for one-to-one comparison of paired individual samples. Commonly used to find differential expression between tumor-normal pairs or before-after treatment pairs.
Usage
pairedStat(baseData, perturbedData = NULL, pairing = NULL)
Arguments
baseData |
Either a list or a matrix. May contain data for just the base condition (for example, normal samples or samples before treatment) or for both the base condition and the perturbed condition (for example, tumor samples or samples after treatment). See details. |
perturbedData |
An optional matrix containing data for the "perturbed"
samples. May be NULL if the |
pairing |
An optional vector indicating the pairing between base and perturbed samples. Entries must be integers. Positive integers indicate perturbed samples and negative integers with the same absolute value indicate the paired base samples. See details. |
Details
In the simplest case, we have gene expression data on one "base" sample and one "perturbed" sample, and the goal is to identify genes whose expression changes between the two states. Our primary assumption is that the standard deviation (SD) of gene expression varies as a smooth function of the mean; fitting such a curve allows us to detect individual genes whose difference is large compared to the smoothed SD.
Note that this assumption is most useful on the log-transformed scale (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4120293/). If your data is on a raw scale, then we recommend transforming it before computing the Newman paired statistic.
The input arguments to the pairedStats
function are moderately
complicated in order to allow users to choose a convenient method for
supplying data when they have multiple paired samples. The first
posssibility is to have all the base samples in one matrix and all the
perturbed samples in a second matrix. In this case, we assume (without
checking) that the columns in the two matrices correspond to the
paired samples, and that the genes-rows are in the same order.
The second possibility is to put the data for both the base samples
and the perturbed samples in the same matrix. In this case, the user
must supply a pairing
vector to explain how the samples should
be matched. If the column order is ("base1", "perturbed1", "base2",
"perturbed2", ...), then the pairiing vector should be written as
c(-1, 1, -2, 2, -3, 3, ...)
.
The third possibility is to provide the paired samples in a list, each of whose entries is a matrix with two columns,with the first column being the base state and the second column being the perturbed state.
This flexibility means that there are three equivalent ways to input
the data even if you have only one base sample (with data in the
one-column matrix B) and one perturbed sample (with data in the
one-column matrix P). If we let BP <- cbind(B, P)
, then we can
choose (1) pairedStats(B, P)
, or (2)
pairedStats(list(BP))
, or (3) pairedStats(BP,
pairing = c(-1,1))
.
Value
A list containing two marices: the nu.statistics
and the p.values
.
Examples
data(GSE6631)
Normal <- GSE6631[, c(1,3)]
Tumor <- GSE6631[, c(2,4)]
### input two separate matrices
ps1 <- pairedStat(Normal, Tumor)
summary(ps1@nu.statistics)
summary(ps1@p.values)
### input one combined matrix and a pairing vector
ps2 <- pairedStat(GSE6631, pairing=c(-1, 1, -2, 2))
summary(ps2@nu.statistics)
summary(ps2@p.values)
### input a list of matrix-pairs
ps3 <- pairedStat(list(One = GSE6631[, 1:2],
Two = GSE6631[, 3:4]))
summary(ps3@nu.statistics)
summary(ps3@p.values)