R: Format biological data

convert2biodata {tcgaViz}

R Documentation

Format biological data

Description

Merges gene and cell datasets with the same TCGA sample identifiers, splits samples according to the expression levels of a selected gene into two categories (below or above average) and formats into a 3-column data frame: gene expression levels, cell types, and gene expression values.

Usage

convert2biodata(algorithm, disease, tissue, gene_x, stat = "mean", path = ".")

Arguments

`algorithm`	character for the algorithm used to estimate the distribution of cell type abundance among : 'Cibersort', 'Cibersort_ABS', 'EPIC', 'MCP_counter', 'Quantiseq', 'Timer', 'Xcell', 'Xcell (2)' and 'Xcell64'.
`disease`	character for the type of TCGA cancer (see the list in extdata/disease_names.csv).
`tissue`	character for the type of TCGA tissue among : 'Additional - New Primary', 'Additional Metastatic', 'Metastatic', 'Primary Blood Derived Cancer - Peripheral Blood', 'Primary Tumor', 'Recurrent Tumor', 'Solid Tissue Normal'
`gene_x`	character for the gene selected in the differential analysis (see the list in extdata/gene_names.csv).
`stat`	character for the statistic to be chosen among "mean", "median" or "quantile".
`path`	character for the path name of the `tcga` dataset.

Value

data frame with the following columns:

high (logical): the expression levels of a selected gene, TRUE for below or FALSE for above average.
cells (factor): cell types.
value (float): the abundance estimation of the cell types.

Examples

data(tcga)
(convert2biodata(
    algorithm = "Cibersort_ABS",
    disease = "breast invasive carcinoma",
    tissue = "Primary Tumor",
    gene_x = "ICOS"
))

[Package tcgaViz version 1.0.2 Index]