R: Create and read a file of p-values for all pairwise tests of...

fichiers {SARP.compo}

R Documentation

Create and read a file of p-values for all pairwise tests of all possible ratios of a compositional vector

Description

These functions allow to perform hypothesis testing on all possible pairwise ratios or differences of a set of variables in a given data frame, and store or read their results in a file

Usage

creer.Fp( d, nom.fichier,
          noms, f.p = student.fpc,
          log = FALSE, en.log = !log,
          nom.var = 'R',
          noms.colonnes = c( "Cmp.1", "Cmp.2", "p" ),
          add.col = "delta",
          sep = ";", dec = ".", row.names = FALSE, col.names = TRUE,
          ... )

grf.Fp( nom.fichier, col.noms = c( 1, 2 ), p = 0.05, col.p = 'p',
        reference = NULL, groupes = NULL,
        sep = ";", dec = ".", header = TRUE,
        ... )

Arguments

`d`	The data frame that contains the compositional variables. Other objects will be coerced as data frames using `as.data.frame`
`nom.fichier`	A length-one character vector giving the name of the file
`noms`	A character vector containing the column names of the compositional variables to be used for ratio computations. Names absent from the data frame will be ignored with a warning. Optionnally, an integer vector containing the column numbers can be given instead. They will be converted to column names before further processing.
`f.p`	An R function that will perform the hypothesis test on a single ratio (or log ratio, depending on `log` and `en.log` values). This function should return a numeric vector, of which the first one will typically be the p-value from the test — see `creer.Mp` for details. Such functions are provided for several common situations, see links at the end of this manual page.
`log`	If `TRUE`, values in the columns are assumed to be log-transformed, and consequently ratios are computed as differences of the columns. The result is in the log scale. If `FALSE`, values are assumed to be raw data and ratios are computed directly.
`en.log`	If `TRUE`, the ratio will be log-transformed before applying the hypothesis test computed by `f.p`. Don't change the default unless you really know what you are doing.
`nom.var`	A length-one character vector giving the name of the variable containing a single ratio (or log-ratio). No sanity check is performed on it: if you experience strange behaviour, check you gave a valid column name, for instance using `make.names`.
`noms.colonnes`	A length-three character vector giving the names of, respectively, the two columns of the data frame that will contain the components identifiers and of the column that will contain the p-value from the test (the first value returned by `f.p`).
`add.col`	A character vector giving the names of additional columns of the data.frame, used for storing additional return values of `f.p` (all but the first one).
`sep`, `dec`, `row.names`, `col.names`, `header`	Options for controling the file format, used by `write.table` and `read.table`.
`col.noms`	A length-two vector giving the two columns that contain the two components of the ratio. Can be given either as column number or column name.
`col.p`	A length-one vector giving the column that contain the p-value of the ratio. Can be given either as column number or column name.
`p`	The p-value cut-off to be used when creating the graph, see `grf.Mp` for details.
`reference`	A character vector giving the names of nodes that should be displayed with a different color in the created graph. These names should match components names present un the file. Typical use would be for reference genes in qRT-PCR experiments. By default, all nodes are displayed in palegreen; reference nodes, if any, will be displayed in orange.
`groupes`	A list of character vectors giving set of logically related nodes, defining groups of nodes that will share common color. Currently unimplemented.
`...`	additional arguments to `f.p`, passed unchanged to it.

Details

These functions are basically the same as the function that create data.frames (creer.DFp) and use data.frames to create a graph (grf.DFp), except thatthey work on text files. This allow to deal with compositionnal data including thousands of components, like RNA-Seq or microarray data.

Seeing the results as a matrix, computations are done in rows and the file is updated after each row. Only the upper-triangular part, without the diagonal, is stored in the file.

The function that creates the graphe from file is not very efficient and can take a lot of time for huge matrices. Making a first filter on the file using shell tools, like gawk or perl, or a dedicated C software and loading the resulting file as a data.frame before converting it into a graph is a better alternative, but may lose some isolated nodes.

Value

creer.Fp does not return anything. grf.Fp returns the result graph.

Note

Creating a file and working from a file is quite inefficient (in terms of speed), so for compositionnal data with only a few components, consider using creer.DFp that creates the data.frame directly in memory and grf.DFp that creates the graphe from a data.frame instead.

Author(s)

Emmanuel Curis (emmanuel.curis@parisdescartes.fr)

Examples

   # load the potery data set
   data( poteries )

   # Create the file name in R temporary directory
   nom.fichier <- paste0( tempdir(), "/fichier_test.csv" )
   nom.fichier

   # Compute one-way ANOVA p-values for all ratios in this data set
   #  and store them in a text file
   creer.Fp( poteries, nom.fichier,
             c( 'Al', 'Na', 'Fe', 'Ca', 'Mg' ),
             f.p = anva1.fpc, v.X = 'Site',
             add.col = c( 'mu0', 'd.C', 'd.CoA', 'd.IT', 'd.L' ) )

   # Make a graphe from it and plot it
   plot( grf.Fp( nom.fichier ) )

   # The file is a simple text-file that can be read as a data.frame
   DFp <- read.table( nom.fichier, header = TRUE, sep = ";", dec = "," )
   DFp

[Package SARP.compo version 0.1.8 Index]