capture_melt_single {nc} | R Documentation |
Capture and melt into a single column
Description
Match a regex to column names of a wide data frame (many
columns/few rows), then melt/reshape the matching columns into a
single result column in a taller/longer data table (fewer columns/more
rows). It is for the common case of melting several columns of
the same type in a "wide" input data table which has several
distinct pieces of information encoded in each column name. For
melting into several result columns of possibly different types,
see capture_melt_multiple
.
Usage
capture_melt_single(...,
value.name = "value",
na.rm = TRUE, verbose = getOption("datatable.verbose"))
Arguments
... |
First argument must be a data frame to melt/reshape; column names
of this data frame will be used as the subjects for regex
matching. Other arguments (regex/conversion/engine) are passed to
|
value.name |
Name of the column in output which has values taken from
melted/reshaped column values of input (passed to
|
na.rm |
remove missing values from melted data? (passed to
|
verbose |
Print |
Value
Data table of reshaped/melted/tall/long data, with a new column for each named argument in the pattern, and additionally variable/value columns.
Author(s)
Toby Hocking <toby.hocking@r-project.org> [aut, cre]
Examples
data.table::setDTthreads(1)
## Example 1: melt iris data and barplot for each numeric variable.
(iris.tall <- nc::capture_melt_single(
iris,
part=".*",
"[.]",
dim=".*",
value.name="cm"))
## Histogram of cm for each variable.
if(require("ggplot2")){
ggplot()+
theme_bw()+
theme(panel.spacing=grid::unit(0, "lines"))+
facet_grid(part ~ dim)+
geom_bar(aes(cm), data=iris.tall)
}
## Example 2: melt who data and use type conversion functions for
## year limits (e.g. for censored regression).
if(requireNamespace("tidyr")){
data(who, package="tidyr", envir=environment())
##2.1 just extract diagnosis and gender to chr columns.
new.diag.gender <- list(#save pattern as list for re-use later.
"new_?",
diagnosis=".*",
"_",
gender=".")
who.tall.chr <- nc::capture_melt_single(who, new.diag.gender, na.rm=TRUE)
print(head(who.tall.chr))
str(who.tall.chr)
##2.2 also extract ages and convert to numeric output columns.
who.tall.num <- nc::capture_melt_single(
who,
new.diag.gender,#previous pattern for matching diagnosis and gender.
ages=list(#new pattern for matching age range.
min.years="0|[0-9]{2}", as.numeric,#in-line type conversion functions.
max.years="[0-9]{0,2}", function(x)ifelse(x=="", Inf, as.numeric(x))),
value.name="count",
na.rm=TRUE)
print(head(who.tall.num))
str(who.tall.num)
##2.3 compute total count for each age range then display the
##subset with max.years lower than a threshold.
who.age.counts <- who.tall.num[, .(
total=sum(count)
), by=.(min.years, max.years)]
print(who.age.counts[max.years < 50])
}
## Example 3: pepseq data.
if(requireNamespace("R.utils")){#for reading gz files with data.table
pepseq.dt <- data.table::fread(
system.file("extdata", "pepseq.txt", package="nc", mustWork=TRUE))
u.pepseq <- pepseq.dt[, unique(names(pepseq.dt)), with=FALSE]
nc::capture_melt_single(
u.pepseq,
"^",
prefix=".*?",
nc::field("D", "", ".*?"),
"[.]",
middle=".*?",
"[.]",
"[0-9]+",
suffix=".*",
"$")
}