R: Permutation-based non-parametric analysis to infer...

permutation {permubiome}

R Documentation

Permutation-based non-parametric analysis to infer differential abundance of features between groups.

Description

This function performs multiple simulations for every feature present in your dataset. All observations are randomly distributed between groups and the median's differences are calculated for all simulations. Differences calculated from simulations are fitted to the normal distribution, Z-scores are obtained, and the respective probability to reject the null hypothesis is then calculated. A multiple testing correction based on Benjamini-Hochberg method is done to uncover the biomarkers associated with your dataset classes. FDR threshold for differential abundance can be set at 0.1.

Usage

permutation(nperm = 1000, write.output = TRUE)

Arguments

`nperm`	The number of permutations to be executed during the analysis (1000 as default). The higher the number of permutations, the more precise will be the p-value returned and the function becomes more time-consuming. We recommend to use nperm = 1000 as the minimum.
`write.output`	When TRUE (as default), a sorted output file is generated and stored in the working directory. Control for the number of features to be present in the output is allowed with the "all" or "selected" parameters prompted.

Author(s)

Alfonso Benitez-Paez

References

Benitez-Paez A. 2023. Permubiome: an R package to perform permutation based test for biomarker discovery in microbiome analyses. [https://cran.r-project.org]. Benitez-Paez A, et al. mSystems. 2020;5:e00857-19. doi: 10.1128/mSystems.00857-19.

Examples

## The function is currently defined as
function (nperm = 1000, write.output = TRUE) 
{
    Class <- NULL
    load("permubiome.RData")
    df_norm <- df_norm
    selected_features <- selected_features
    tags.in <- selected_features[, 1]
    tags.out <- setdiff(colnames(df_norm[3:ncol(df_norm)]), tags.in)
    for (a in ncol(df_norm):3) {
        if ((colnames(df_norm[a]) %in% tags.out)) {
            df_norm[, a] <- NULL
        }
    }
    classes <- levels(df_norm$Class)
    if (REFERENCE == "") {
        REFERENCE <- classes[1]
    }
    else if (REFERENCE == classes[2]) {
        classes[2] <- classes[1]
        classes[1] <- REFERENCE
    }
    df_norm$Class <- factor(df$Class, levels = (c(classes[1], 
        classes[2])))
    group1 <- subset(df_norm, Class == classes[1])
    group2 <- subset(df_norm, Class == classes[2])
    categories <- colnames(df_norm)
    size1 <- nrow(group1)
    size2 <- nrow(group2)
    size <- size1 + size2
    pvalue_matrix <- matrix(, nrow = ncol(df_norm) - 2, ncol = 5, 
        byrow = T)
    colnames(pvalue_matrix) <- c("Category", paste("Median ", 
        classes[1], sep = ""), paste("Median ", classes[2], sep = ""), 
        "p.value", "p.adj (fdr)")
    print(paste("Permutation test in progress - This can take some seconds or minutes!"))
    for (i in 3:(ncol(df_norm))) {
        category <- categories[i]
        diff <- median(group1[, i]) - median(group2[, i])
        x <- c(group1[, i], group2[, i])
        y <- array(, nperm)
        for (j in 1:nperm) {
            set <- sample(size, size2, replace = FALSE)
            diff_iter <- median(x[set]) - median(x[-set])
            y[j] <- diff_iter
            ref_score <- (diff - mean(y))/sd(y)
        }
        if (ref_score >= 0) {
            pvalue.i <- pnorm(ref_score, lower.tail = F)
        }
        else {
            pvalue.i <- pnorm(ref_score)
        }
        if (pvalue.i != 0) {
            pvalue_matrix[(i - 2), 1] <- category
        }
        if (pvalue.i != 0) {
            pvalue_matrix[(i - 2), 2] <- round(median(group1[, 
                i]), digits = 0)
        }
        if (pvalue.i != 0) {
            pvalue_matrix[(i - 2), 3] <- round(median(group2[, 
                i]), digits = 0)
        }
        if (pvalue.i != 0) {
            pvalue_matrix[(i - 2), 4] <- format((pvalue.i * 2), 
                digits = 7, scientific = F)
        }
        else {
            pvalue_matrix[(i - 2), 2] <- 1
        }
        pb = txtProgressBar(min = 0, max = (ncol(df_norm) - 2), 
            initial = 0, style = 3)
        setTxtProgressBar(pb, (i - 2))
        invisible()
    }
    pvalue_matrix <- pvalue_matrix[order(pvalue_matrix[, 4]), 
        ]
    pvalue_matrix[, 5] <- format(p.adjust(as.numeric(pvalue_matrix[, 
        4], n = nrow(pvalue_matrix), method = "fdr")), digits = 7, 
        scientific = F)
    cat("\n")
    if (write.output == TRUE) {
        all <- readline("Do you want to include all fetures in the output? (yes/no) : ")
        if (substr(all, 1, 1) == "n") {
            select <- as.numeric(readline("Level of significance to output features (i.e. 0.2) : "))
            significant <- subset(pvalue_matrix, pvalue_matrix[, 
                5] <= select)
        }
        else {
            significant <- pvalue_matrix
        }
        write.table(significant, file = "permutation.output", 
            quote = F, row.names = F, sep = "\t")
        print(paste("Permutation test done and output table printed!"))
    }
    else {
        significant <- pvalue_matrix
        significant
        print(paste("Permutation test done!"))
    }
    save(df, df_norm, REFERENCE, classes, selected_features, 
        nperm, tags.in, tags.out, pvalue_matrix, file = "permubiome.RData")
  }

[Package permubiome version 1.3.2 Index]