R: Search for differentially expressed genes/probes

fcros2 {fcros}

R Documentation

Search for differentially expressed genes/probes

Description

Implementation of a method based on fold change or ratio rank ordering statistics for detecting differentially expressed genes. This function should be used with dataset in two separate tables and from two biological conditions datasets (microarray, RNA-seq, ...). Pairwise combinations of samples from the two biological conditions are used to obtain a matrix of fold changes. For each combination, the FCs obtained are sorted in an increasing order and the corresponding rank values are associated with genes/probes. Then, a statistic is associated with each gene/probe.

Usage

fcros2(xdata1, xdata2, cont, test, log2.opt = 0, trim.opt = 0.25)

Arguments

`xdata1`	A matrix or a table containing two biological conditions dataset to process for detecting differentially expressed genes. The rownames of xdata1 are used for the output idnames.
`xdata2`	A matrix or a table containing two biological conditions dataset to process fordetecting differentially expressed genes: `xdata2`.
`cont`	A vector containing the label names of the control samples: `cont` = c("cont01", "cont02", ...)
`test`	A vector containing the label names of the test samples: `test` = c("test01", "test02", "test03", ...)
`log2.opt`	A scalar equals to 0 or 1. The value 0 (default) means that data in the tables "xdata1" and "xdata2" are expressed in a log2 scale: `log2.opt` = 0
`trim.opt`	A scalar between 0 and 0.5. The value 0.25 (default) means that 25% of the lower and the upper rank values for a gene are not used for computing the statistics "ri", i.e. the interquartile range rank values are averaged: `trim.opt` = 0.25

Details

The label names appearing in the parameters "cont" and "test" should match some label names in the columns of the data matrices "xdata1" and "xdata2". It is not necessary to use all column label names appearing in matrices "xdata1" and "xdata2". However, it is assumed that the same genes (same IDs or symbol) are used in xdata1 and xdata2.

Value

This function returns a data frame containing 9 components

`idnames`	A vector containing the list of the IDs or symbols associated with genes
`ri`	The average of rank values associated with genes in the datasets. These values are rank statistics leading to f-values and p-values.
`FC`	The fold changes for genes. These fold changes are calculated as a ratio of averages from the test and control samples. Non log scale values are used in the calculation.
`FC2`	The robust fold changes for genes. These fold changes are calculated as a trimmed mean of the fold changes or ratios obtained from the dataset samples using the pairwise comparisons. Non log scale values are used in the calculation.
`f.value`	The f-values are probabilities associated with genes. These values are obtained using the "mean" and the "standard deviation" ("sd") of the statistics "ri". The "mean" and "sd" are used as a normal distribution parameters.
`p.value`	The p-values associated with genes. These values are obtained from the f-values.
`bounds`	Two values which are the lower and the upper bound or the minimum and the maximum values of non standardized "ri".
`params`	Three values, which are the estimates for the parameters "delta" (average difference between consecutive ordered average of rank) "mean" (mean value of variable "ri") and the standard deviation ("sd") of variable "ri".
`params_t`	Three values, which are the theoretical levels for parameters "delta", "mean" and "sd".

Author(s)

Doulaye Dembele doulaye@igbmc.fr

References

Dembele D and Kastner P, Fold change rank ordering statistics: a new method for detecting differentially expressed genes, BMC Bioinformatics, 2014, 15:14

Dembele D and Kastner P, Comment on: Fold change rank ordering statistics: a new method for detecting differentially expressed genes, BMC Bioinformatics, 2016, 17:462

Examples

   data(fdata);

   rownames(fdata) <- fdata[,1];

   cont <- c("cont01", "cont02", "cont03", "cont04", "cont05",
             "cont06", "cont07", "cont08", "cont09", "cont10");
   test <- c("test01", "test02", "test03", "test04", "test05",
             "test06", "test07", "test08", "test09", "test10");
   log2.opt <- 0;
   trim.opt <- 0.25;

   # perform fcros2()
   xdata1 <- fdata[,c(2:5, 12:17)];
   xdata2 <- fdata[,c(6:11, 18:21)];
   rownames(xdata1) <- fdata[,1];
   rownames(xdata2) <- fdata[,1];

   af2 <- fcros2(xdata1, xdata2, cont, test, log2.opt, trim.opt);

   # now select top 20 down and/or up regulated genes
   top20 <- fcrosTopN(af2, 20);
   alpha1 <- top20$alpha[1];
   alpha2 <- top20$alpha[2];
   id.down  <- matrix(0,1);
   id.up <- matrix(0,1);
   n <- length(af2$FC);
   f.value <- af2$f.value;

   idown <- 1;
   iup <- 1;
   for (i in 1:n) {
       if (f.value[i] <= alpha1) { id.down[idown] <- i; idown <- idown + 1; }
       if (f.value[i] >= alpha2) { id.up[iup] <- i; iup <- iup + 1; }
   }

   data.down <- fdata[id.down[1:(idown-1)], ];
   ndown <- nrow(data.down);
   data.up <- fdata[id.up[1:(iup-1)], ];
   nup <- nrow(data.up);

   # now plot down regulated genes
   t <- 1:20;
   op = par(mfrow = c(2,1));
   plot(t, data.down[1,2:21], type = "l", col = "blue", xlim = c(1,20),
        ylim = c(0,18), main = "Top down-regulated genes");
   for (i in 2:ndown) {
       lines(t,data.down[i,2:21], type = "l", col = "blue")
   }

   # now plot down and up regulated genes
   plot(t, data.up[1,2:21], type = "l", col = "red", xlim = c(1,20),
       ylim = c(0,18), main = "Top up-regulated genes");
   for (i in 2:nup) {
       lines(t, data.up[i,2:21], type = "l", col = "red")
   }
   par(op)

[Package fcros version 1.6.1 Index]