R: Averaging of multiple reference genes

refmean {qpcR}

R Documentation

Averaging of multiple reference genes

Description

This function averages the expression of several reference genes before calculation of gene expression ratios by ratiocalc or ratiobatch. The method is similar to that described in Vandesompele et al. (2002), but uses arithmetic averaging of threshold cycles/efficiencies and not geometric averaging of relative expression values. This is equivalent, as discussed in 'Details' and as shown in 'Examples'. An essential extension of this method is, that if replicates for the reference genes are supplied, the threshold cycles and efficiencies are subjected to error propagation prior to ratio calculation. The propagated error is then included in the calculation of the gene expression ratios, as advocated in Nordgard et al. (2006).

Usage

refmean(data, group, which.eff = c("sig", "sli", "exp", "mak", "ext"),
        type.eff = c("individual", "mean.single"), 
        which.cp = c("cpD2", "cpD1", "cpE", "cpR", "cpT", "Cy0", "ext"),
        verbose = TRUE, ...)

Arguments

`data`	multiple qPCR data generated by `pcrbatch`.
`group`	a character vector defining the replicates (if any) of control/treatment samples and reference genes/genes-of-interest. See 'Details'
`which.eff`	efficiency calculated by which method. Defaults to sigmoidal fit. Can also be a value such as 1.8, as shown in 'Examples'. See `ratiocalc`.
`type.eff`	using individual or averaged efficiencies for the replicates of a reference gene. See 'Details'.
`which.cp`	type of threshold cycles to be used for the analysis. Defaults to cpD2. See `ratiocalc`.
`verbose`	logical. If `TRUE`, the steps of analysis are shown in the console window.
`...`	parameters to be supplied to `propagate`.

Details

As in ratiobatch, the samples are to be defined as a character vector in the style of "g1s1", "g1c1", "r1s1" and "r1c1" etc. If refmean is used as a standalone function or switched on in ratiobatch using refmean = TRUE, different reference genes per control/treatment samples are averaged when supplied either as single runs or as replicates.

Examples (omitting genes-of-interest in control/treatment samples):
2 reference genes, 2 replicates each:
c("r1s1", "r1s1, "r2s1", "r2s1", "r1c1", "r1c1, "r2c1", "r2c1", ...).
3 reference genes, no replicates:
c("r1s1", "r2s1, "r3s1", "r1c1", "r2c1, "r3c1", ...)

Averaging of multiple reference genes is accomplished the following way:
Given i reference genes with j replicates in a sample, all replicates r_{ij} are used for calculating mean \mu_{r_i} and standard deviation \sigma_{r_i} of the threshold cycles and efficiencies. The overall (grand) mean \mu_r and propagated error \sigma_r is calculated using propagate with first-order Taylor expansion including covariance: \sigma_r = F_{r_i}C_{r_i}F_{r_i}^T. Finally, a vector of length L = n(r_{ij}) containing equidistant numbers X = (x_1, x_2, x_3, \ldots x_L) with mean \mu_r and standard deviation \sigma_r is generated for a new overall reference gene r_1. This is done using the internal function qpcR:::makeStat which calculates a shifted (\mu_r) and scaled (\sigma_r) Z-transformation on a vector x_1 \ldots x_L:

Z_i = \mu_r + \frac{(x_i - \bar{X})}{\sigma_X} \cdot \sigma_r

The new Z_i threshold cycle and efficiency values replace all values of r_{ij} in data. When using ratiobatch, this modified data is then used for the ratio calculation, again using propagate to calculate errors for ratios using the Z_i values as mentioned above.
By using logarithmic identities (http://en.wikipedia.org/wiki/Logarithmic_identities), it can be shown that the geometric mean can be transformed to the arithmetic mean by logarithmation (assuming constant E):

\left( \prod_{i=1}^n E^{x_i} \right)^{\frac{1}{n}} = \frac{1}{n} \cdot \log_E \left( \prod_{i=1}^n E^{x_i} \right) = \frac{1}{n} \sum_{i=1}^n x_i

Hence, arithmetic averaging of the threshold cycles BEFORE ratio calculation is the same as doing geometric averaging on relative quantities AFTER ratio calculation. This is demonstrated in 'Examples' and also mentioned in the geNorm manual (http://www.gene-quantification.com/geNorm_manual.pdf) in Q8 (page 12).

When setting type.eff = "individual" (default), all efficiencies from replicates of a reference gene in a control/treatment sample E(r_{ij}) are used for calculating mean \mu_{E(r_i)} and standard deviation \sigma_{E(r_i)}, the latter being used for calculating a propagated error for all reference gene efficiencies \sigma_{E(r)}. If type.eff = "mean.single", all E(r_{ij}) values from the replicates are set to the same value \mu_{E(r_i)}, that is, there is no variation assumed between the different E(r_{ij}). In this case, \sigma_{E(r_i)} = 0, so that no error of the replicates is propagated to \sigma_{E(r)}. This results in smaller overall errors of the output, but it can be debated if this is a realistic approach, hence both settings were implemented.

which.eff can be supplied with an efficiency value such as 1.8, which is then used as the efficiency for all reference runs E(r_{ij}).

Value

The same dataset data which was supplied to the function, but with modified threshold cycle/efficiency values in which the values are created per sample in a way, that they have the mean of all averaged reference genes and the same standard deviation as obtained by error propagation. See 'Details' for a more thorough explanation. Furthermore, a modified label vector "NAME_mod" is written to the global environment (if "NAME" was supplied for group) in which the different reference gene labels are aggregated, i.e. c("r1c1", "r2c1", "r3c1") will become c("r1c1", "r1c1", "r1c1"). This new label vector is also attached as an attribute to the output data and can be obtained by attr(RES1, "group").

Note

The function checks stringently if the same number of different reference genes are used for control and treatment samples, although the number of replicates may differ.
Example:
GROUP <- c("r1c1", "r1c1", "r2c1", "r2c1", "r1s1", "r2s1") will work (2 reference genes in control/treatment samples), but GROUP <- c("r1c1", "r2c1", "r3c1", "r1s1", "r1s1", "r1s2", "r1s2", "r2s1", "r2s1") will not work (3 reference genes in controls, only 2 in treatment samples). Also, when no or only one reference genes are detected, the original data is not averaged and returned unchanged.

Author(s)

Andrej-Nikolai Spiess

References

Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes.
Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, Speleman F.
Genome Biol (2002), 3: research0034-research0034.11.

Error propagation in relative real-time reverse transcription polymerase chain reaction quantification models: the balance between accuracy and precision.
Nordgard O, Kvaloy JT, Farmen RK, Heikkil? R.
Anal Biochem (2006), 356: 182-193.

Examples

## Not run: 
## Replacing the reference gene values by
## averaged ones in the original data.
## => RES1 is new dataset.
## => GROUP1_mod in global environment is
## new labeling vector.
DAT1 <- pcrbatch(reps, fluo = 2:19, model = l5)
GROUP1 <- c("r1c1", "r1c1", "r2c1", "r2c1", "g1c1", "g1c1",
           "r1s1", "r1s1", "r1s2", "r1s2", "r2s1", "r2s1",
           "r2s2", "r2s2", "g1s1", "g1s1", "g1s2", "g1s2") 
RES1 <- refmean(DAT1, GROUP1, which.eff = "sig", which.cp = "cpD2")

## Using three reference genes without replicates
## and then 'ratiobatch'.
## This can also be called in 'ratiobatch' directly
## with parameter 'refmean = TRUE'. See there.
## In this example, already averaged dataset and 
## new labeling vector are supplied to 'ratiobatch', 
## so one has to set 'refmean = FALSE'.
DAT2 <- pcrbatch(reps, fluo = 2:9, model = l5)
GROUP2 <- c("r1c1", "r2c1", "r3c1", "g1c1", "r1s1", "r2s1", "r3s1", "g1s1" ) 
RES2 <- refmean(DAT2, GROUP2, which.eff = "sig", which.cp = "cpD2")
ratiobatch(RES2, GROUP2_mod, refmean = FALSE)

## Comparison between 'refmean' ct-value arithmetic averaging
## and 'geNorm' relative quantities geometric averaging
## using data from the geNorm manual (2008), page 6.
## We will use HK1-HK3 as in the manual (no replicates).
## First we create a 'pcrbatch' dataset and then 
## override the ct values with those of the manual and all
## efficiencies with E = 2. Sample A is considered as control sample.
DAT3 <- pcrbatch(reps, fluo = 2:17, model = l5)
DAT3[8, -1] <- c(32.10, 27.00, 34.90, 23.00,
                 33.30, 28.40, 36.10, 24.20,
                 31.00, 27.50, 34.00, 26.35,
                 30.50, 28.20, 33.00, 25.45)
DAT3[1, -1] <- 2
GROUP3 <- c("r1c1", "r2c1", "r3c1", "g1c1",
            "r1s1", "r2s1", "r3s1", "g1s1",
            "r1s2", "r2s2", "r3s2", "g1s2",
            "r1s3", "r2s3", "r3s3", "g1s3")
RES3 <- refmean(DAT3, GROUP3, which.eff = "sig", which.cp = "cpD2")
ratiobatch(RES3, GROUP3_mod, which.cp = "cpD2",
           which.eff = "sig", refmean = FALSE)
## Results:
## r1c1:g1c1:r1s1:g1s1  refmean 1.0497
##                      geNorm 1.0472 (2.351/2.245)
## r1c1:g1c1:r1s2:g1s2  refmean 0.0693
##                      geNorm 0.0695 (0.156/2.245)
## r1c1:g1c1:r1s3:g1s3  refmean 0.1081
##                      geNorm 0.1074 (0.241/2.245)
## Slight differences are due to rounding.

## End(Not run)

[Package qpcR version 1.4-1 Index]