puissance {SARP.compo}R Documentation

Estimate the power and the type-I error of the disjoint-subgraphs method

Description

Estimate the power and the type-I error of the disjoint-graph method to detect a change in compositions between different conditions

Usage

estimer.puissance( composition, cv.composition,
                   taille.groupes = 10, masque,
                   f.p, v.X = 'Condition',
                   seuil.candidats = ( 5:30 ) / 100,
                   f.correct = groupes.identiques,
                   groupes.attendus = composition$Graphes[[ 1 ]]$Connexe,
                   avec.classique = length( attr( composition, "reference" ) ) > 0,
                   f.correct.classique = genes.trouves,
                   genes.attendus,
                   B = 3000, n.coeurs = 1,
                   ... )

estimer.alpha( composition, cv.composition,
               taille.groupes = 10, masque,
               f.p, v.X = 'Condition',
               seuil.candidats = ( 5:30 ) / 100,
               avec.classique = length( attr( composition, "reference" ) ) > 0,
               B = 3000, n.coeurs = 1,
               ... )

Arguments

composition

A composition model, as obtained by modele_compo. For simulations under the null hypothesis (estimer.alpha), the first condition is duplicated to other conditions (but not the cv.composition, if provided as a matrix, allowing to explore some kinds of pseudo-null hypothesis).

cv.composition

The expected coefficient of variation of the quantified amounts. Should be either a single value, that will be used for all components and all conditions, or a matrix with the same structure than composition$Absolue: one row for each condition, one column for each component, in the same order and with the same names. Coefficients of variations are expected in the amount scale, in raw form (that is, give 0.2 for a 20% coefficient of variation)

.

taille.groupes

The sample size for each condition. Unused if masque is given. If a single value, it will be used for all conditions. Otherwise, should have the same length that the number of conditions in the provided model.

masque

A data.frame that will give the dataset design for a given experiment. Should contain at least one column containing the names of the conditions, with values being in the conditions names in composition. If not provided, it is generated from taille.groupes as a single column named ‘Condition’.

f.p

The function used to analyse the dataset. See creer.Mp for details.

v.X

The name of the column identifying the different conditions in masque.

seuil.candidats

A vector of p-value cut-offs to be tested. All values should be between 0 and 1.

f.correct

A function to determine if the result of the analysis is the expected one. Defaults to a function that compares the disjoint sub-graphs of a reference graph and the obtained one.

groupes.attendus

The reference graph for the above function. Defaults to the theoretical graph of the model, for the comparison between the first and the second conditions.

avec.classique

If TRUE, analysis is also done using an additive log-ratio (alr)-like method, using the geometric mean of the reference components as the “normalisation factor”. This correspond to the Delta-Delta-Ct method, or similar methods, in qRT-PCR. With this method, each non-reference component is tested in turn after division by the normalisation factor.

If requested, the analysis is done with and without multiple testing correction (with Holm's method). The “cut-off p-value” is used as the nominal type~I error level for the individual tests.

f.correct.classique

A function to determine if the alr-like method finds the correct answer. Defaults to a function that compares the set of significant tests with the set of expected components.

genes.attendus

A character vector giving the names of components expected to behave differently than the reference set.

B

The number of simulations to be done.

n.coeurs

The number of CPU cores to use in computation, with parallelization using forks (does not work on Windows) with the help of the parallel package.

...

Additionnal parameters for helper functions, including f.p, f.correct and f.correct.classique

Details

Use this function to simulate experiments and explore the properties of the disjoint graph method in a specified experimental context. Simulations are done using a log-normal model, so analysis is always done on the log scale. Coefficients of variation in the original scale hence directly translate into standard deviations in the log-scale.

For power analysis, care should be taken that any rejection of the null hypothesis “nothing is different between conditions” is counted as a success, even if the result does not respect the original changes. This is the reason for the additional correct-finding probability estimation. However, defining what is a correct, or at least acceptable, result may be not straightforward, especially for comparison with other analysis methods.

Note also that fair power comparisons can be done only for the same type I error level. Hence, for instance, power of the corrected alr-like method at p = 0.05 should be compared to the power of the disjoint-graph method at its “optimal” cut-off.

Value

An object of class SARPcompo.simulation, with a plot method. It is a data.frame with the following columns:

Seuil

The cut-offs used to build the graph

Disjoint

The number of simulations that led to disjoint graphs.

Correct

The number of simulations that led to the correct graph (as defined by the f.correct function).

If avec.classique is TRUE, it has additionnal columns:

DDCt

The number of simulations that led at least one significant test using the alr-like method.

DDCt.H

The number of simulations that led at least one significant test using the alr-like method, after multiple testing correction using Holm's method.

DDCt.correct

The number of simulations that detected the correct components (as defined by the f.correct.classique function) using the alr-like method.

DDCt.H.correct

As above, but after multiple testing correction using Holm's method.

It also stores a few informations as attributes, including the total number of simulations (attribute n.simulations).

Author(s)

Emmanuel Curis (emmanuel.curis@parisdescartes.fr)

See Also

modele_compo to create a compositional model for two or more conditions.

creer.Mp, which is used internally, for details about analysis functions.

choisir.seuil for a simpler interface to estimate the optimal cut-off.

Examples

  ## Create a toy example: four components, two conditions
  ##  components 1 and 2 do not change between conditions
  ##  components 3 and 4 are doubled
  ##  component  1 is a reference component
  me <- rbind( 'A' = c( 1, 1, 1, 1 ),
               'B' = c( 1, 1, 2, 2 ) )
  colnames( me ) <- paste0( "C-", 1:4 )

  md <- modele_compo( me, reference = 'C-1' )

  ## How many simulations?
  ##  50 is for speed; increase for useful results...
  B <- 50

  ## What is the optimal cut-off for this situation?
  ## (only a few simulations for speed, should be increased)
  ## (B = 3000 suggests a cut-off between 0.104 and 0.122)
  seuil <- choisir.seuil( 4, B = B )

  ## What is approximately the type I error
  ## between conditions A and B using a Student test
  ## with a CV of around 50 % ?
  ##  (only a few simulations for speed, should be increased)
  alpha <- estimer.alpha( md, cv = 0.50, B = B,
                          f.p = student.fpc )

  # Plot it : darkgreen = the disjoint graph method
  #           orange    = the alr-like method, Holm's corrected
  #           salmon    = the alr-like method, uncorrected
  plot( alpha )

  ## What is approximately the power to detect that something changes
  ## between conditions A and B using a Student test
  ## with a CV of around 50 % ?
  ##  (only a few simulations for speed, should be increased)
  puissance <- estimer.puissance( md, cv = 0.50, B = B,
                                  f.p = student.fpc,
                                  genes.attendus = c( 'C-3', 'C-4' )  )

  # Plot it : darkgreen = the disjoint graph method
  #           orange    = the alr-like method, Holm's corrected
  #           salmon    = the alr-like method, uncorrected
  plot( puissance )

  ## Do we detect the correct situation in general?
  ##  (that is, exactly two sets: one with C-1 and C-2, the second with
  ##   C-3 and C-4 --- for the alr-like method, that only C-3 and C-4
  ##   are significant)
  #           darkgreen = the disjoint graph method
  #           orange    = the alr-like method, Holm's corrected
  #           salmon    = the alr-like method, uncorrected
  plot( puissance, correct = TRUE )
  

[Package SARP.compo version 0.1.8 Index]