R: Select transcripts/genes with significant p-values.

RN_select {RNentropy}

R Documentation

Select transcripts/genes with significant p-values.

Description

Select transcripts with global p-value lower than an user defined threshold and provide a summary of over- or under-expression according to local p-values.

Usage

RN_select(Results, gpv_t = 0.01, lpv_t = 0.01, method = "BH")

Arguments

`Results`	The output of RNentropy or RN_calc.
`gpv_t`	Threshold for global p-value. (Default: 0.01)
`lpv_t`	Threshold for local p-value. (Default: 0.01)
`method`	Multiple test correction method. Available methods are the ones of p.adjust. Type p.adjust.methods to see the list. Default: BH (Benjamini & Hochberg)

Value

The original input containing

`gpv`	-log10 of the global p-values
`lpv`	-log10 of the local p-values
`c_like`	results formatted as in the output of the C++ implementation of RNentropy.
`res`	The results data.frame containing the original expression values together with the -log10 of global and local p-values.
`design`	The experimental design matrix.

and a new dataframe

selected

Transcripts/genes with a corrected global p-value lower than gpv_t. For each condition it will contain a column where values can be -1,0,1 or NA. 1 means that all the replicates of this condition have expression value higher than the average and local p-value <= lpv_t (thus the corresponding gene will be over-expressed in this condition). -1 means that all the replicates of this condition have expression value lower than the average and local p-value <= lpv_t (thus the corresponding gene will be under-expressed in this condition). 0 means that at least one of the replicates has a local p-value > lpv_t. NA means that the local p-values of the replicates are not consistent for this condition, that is, at least one replicate results to be over-expressed and at least one results to be under-expressed.

Author(s)

Giulio Pavesi - Dep. of Biosciences, University of Milan

Federico Zambelli - Dep. of Biosciences, University of Milan

Examples

data("RN_Brain_Example_tpm", "RN_Brain_Example_design")
#compute statistics and p-values (considering only a subset of genes due to
#examples running time limit of CRAN)
Results <- RN_calc(RN_Brain_Example_tpm[1:10000,], RN_Brain_Example_design)
Results <- RN_select(Results)

## The function is currently defined as
function (Results, gpv_t = 0.01, lpv_t = 0.01, method = "BH") 
{
    lpv_t <- -log10(lpv_t)
    gpv_t <- -log10(gpv_t)
    Results$gpv_bh <- -log10(p.adjust(10^-Results$gpv, method = method))
    true_rows <- (Results$gpv_bh >= gpv_t)
    design_b <- t(Results$design > 0)
    Results$lpv_sel <- data.frame(row.names = rownames(Results$lpv)[true_rows])
    for (d in seq_along(design_b[, 1])) {
        col <- apply(Results$lpv[true_rows, ], 1, ".RN_select_lpv_row", 
            design_b[d, ], lpv_t)
        Results$lpv_sel <- cbind(Results$lpv_sel, col)
        colnames(Results$lpv_sel)[length(Results$lpv_sel)] <- paste("condition", 
            d, sep = "_")
    }
    lbl <- Results$res[, !sapply(Results$res, is.numeric)]
    Results$selected <- cbind(lbl[true_rows], Results$gpv[true_rows], 
	Results$gpv_bh[true_rows], Results$lpv_sel)
    colnames(Results$selected) <- c(names(which(!sapply(Results$res, 
	is.numeric))), "GL_LPV", "Corr. GL_LPV", colnames(Results$lpv_sel))
    Results$selected <- Results$selected[order(Results$selected[,3], decreasing=TRUE),]
    Results$lpv_sel <- NULL
    return(Results)
  }

[Package RNentropy version 1.2.3 Index]