R: Prepare data in tidy format

prepare_tidy_data {ulrb}

R Documentation

Prepare data in tidy format

Description

Function to transforms common abundance table formats into a "long" format.

Usage

prepare_tidy_data(data, sample_names, samples_in = "cols", ...)

Arguments

`data`	a data.frame in "wide" format, with samples in either columns or rows. This data.frame should not include any data besides abundance values per sample, per taxonomic unit. Additional data (e.g. taxonomy details) should be added afterwards.
`sample_names`	a vector with the name of all samples.
`samples_in`	a vector specifying the location of the samples. It can either be "cols" (default) if samples are in columns, or "rows" if samples are in rows.
`...`	additional arguments

Details

This function guarantees that the abundance table includes one column with sample ID's and one column with abundance.

Common species table formats

There are two common formats for abundance tables:

samples as rows and phylogenetic units as columns;
phylogenetic units as rows and samples as columns.

However, both formats are not tidy, because they include several columns with the same variable. They are in a "wide format" instead of a "long format".

This function re-organizes samples and phylogenetic units so that there is a single column with the samples ID's and another with the abundance scores; Extra columns are allowed.

Value

An abundance table in long format, compatible with dplyr pipes and ulrb package functions.

Examples

library(dplyr)
#
sample_names <- c("ERR2044662", "ERR2044663", "ERR2044664",
                   "ERR2044665", "ERR2044666", "ERR2044667",
                   "ERR2044668", "ERR2044669", "ERR2044670")

# Example for samples in cols and with additional data available
prepare_tidy_data(nice, sample_names = sample_names, samples_in = "cols")

# Example for samples in rows
# Select columns with samples from nice
nice_rows <- nice %>% select(all_of(sample_names))

# Change columns to rows
nice_rows <- nice_rows %>% t() %>% as.data.frame()

# Turn colnames into phylogenetic units ID
colnames(nice_rows) <- paste0("OTU_", seq_along(colnames(nice_rows)))

prepare_tidy_data(nice_rows, sample_names = sample_names, samples_in = "rows")

[Package ulrb version 0.1.5 Index]