R: Format biological data

convert_biodata {tcgaViz}

R Documentation

Format biological data

Description

Merges gene and cell datasets with the same TCGA sample identifiers, splits samples according to the expression levels of a selected gene into two categories (below or above average) and formats into a 3-column data frame: gene expression levels, cell types, and gene expression values.

Usage

convert_biodata(
  genes,
  cells,
  select = colnames(genes)[3],
  stat = "mean",
  disease = NULL,
  tissue = NULL
)

Arguments

`genes`	data frame whose first two columns contain identifiers and the others float values.
`cells`	data frame whose first two columns contain identifiers and the others float values.
`select`	character for a column name in genes.
`stat`	character for the statistic to be chosen among "mean", "median" or "quantile".
`disease`	character for the type of TCGA cancer (see the list in extdata/disease_names.csv).
`tissue`	character for the type of TCGA tissue among : 'Additional - New Primary', 'Additional Metastatic', 'Metastatic', 'Primary Blood Derived Cancer - Peripheral Blood', 'Primary Tumor', 'Recurrent Tumor', 'Solid Tissue Normal'

Details

disease and tissue arguments should be displayed in the title of plot.biodata() only if the genes argument does not already have them in its attributes.

Value

data frame with the following columns:

high (logical): the expression levels of a selected gene, TRUE for below or FALSE for above average.
cells (factor): cell types.
value (float): the abundance estimation of the cell types.

Examples

data(tcga)
(df_formatted <- convert_biodata(tcga$genes, tcga$cells$Cibersort, "ICOS"))

[Package tcgaViz version 1.0.2 Index]