unlist2d {collapse} | R Documentation |
Recursive Row-Binding / Unlisting in 2D - to Data Frame
Description
unlist2d
efficiently unlists lists of regular R objects (objects built up from atomic elements) and creates a data frame representation of the list through recursive flattening and intelligent row-binding operations. It is a full 2-dimensional generalization of unlist
, and best understood as a recursive generalization of do.call(rbind, ...)
.
It is a powerful tool to create a tidy data frame representation from (nested) lists of vectors, data frames, matrices, arrays or heterogeneous objects. For simple row-wise combining lists/data.frame's use the non-recursive rowbind
function.
Usage
unlist2d(l, idcols = ".id", row.names = FALSE, recursive = TRUE,
id.factor = FALSE, DT = FALSE)
Arguments
l |
a unlistable list (with atomic elements in all final nodes, see |
idcols |
a character stub or a vector of names for id-columns automatically added - one for each level of nesting in |
row.names |
|
recursive |
logical. if |
id.factor |
if |
DT |
logical. |
Details
The data frame representation created by unlist2d
is built as follows:
Recurse down to the lowest level of the list-tree, data frames are exempted and treated as a final (atomic) elements.
Identify the objects, if they are vectors, matrices or arrays convert them to data frame (in the case of atomic vectors each element becomes a column).
Row-bind these data frames using data.table's
rbindlist
function. Columns are matched by name. If the number of columns differ, fill empty spaces withNA
's. If!isFALSE(idcols)
, create id-columns on the left, filled with the object names or indices (if the (sub-)list is unnamed). If!isFALSE(row.names)
, store rownames of the objects (if available) in a separate column.Move up to the next higher level of the list-tree and repeat: Convert atomic objects to data frame and row-bind while matching all columns and filling unmatched ones with
NA
's. Create another id-column for each level of nesting passed through. If the list-tree is asymmetric, fill empty spaces in lower-level id columns withNA
's.
The result of this iterative procedure is a single data frame containing on the left side id-columns for each level of nesting (from higher to lower level), followed by a column containing all the rownames of the objects (if !isFALSE(row.names)
), followed by the data columns, matched at each level of recursion. Optimal results are obtained with symmetric lists of arrays, matrices or data frames, which unlist2d
efficiently binds into a beautiful data frame ready for plotting or further analysis. See examples below.
Value
A data frame or (if DT = TRUE
) a data.table.
Note
For lists of data frames unlist2d
works just like data.table::rbindlist(l, use.names = TRUE, fill = TRUE, idcol = ".id")
however for lists of lists unlist2d
does not produce the same output as data.table::rbindlist
because unlist2d
is a recursive function. You can use rowbind
as a faithful alternative to data.table::rbindlist
.
The function rrapply::rrapply(l, how = "melt"|"bind")
is a fast alternative (written fully in C) for nested lists of atomic elements.
See Also
rowbind
, rsplit
, rapply2d
, List Processing, Collapse Overview
Examples
## Basic Examples:
l <- list(mtcars, list(mtcars, mtcars))
tail(unlist2d(l))
unlist2d(rapply2d(l, fmean))
l = list(a = qM(mtcars[1:8]),
b = list(c = mtcars[4:11], d = list(e = mtcars[2:10], f = mtcars)))
tail(unlist2d(l, row.names = TRUE))
unlist2d(rapply2d(l, fmean))
unlist2d(rapply2d(l, fmean), recursive = FALSE)
## Groningen Growth and Development Center 10-Sector Database
head(GGDC10S) # See ?GGDC10S
namlab(GGDC10S, class = TRUE)
# Panel-Summarize this data by Variable (Emloyment and Value Added)
l <- qsu(GGDC10S, by = ~ Variable, # Output as list (instead of 4D array)
pid = ~ Variable + Country,
cols = 6:16, array = FALSE)
str(l, give.attr = FALSE) # A list of 2-levels with matrices of statistics
head(unlist2d(l)) # Default output, missing the variables (row-names)
head(unlist2d(l, row.names = TRUE)) # Here we go, but this is still not very nice
head(unlist2d(l, idcols = c("Sector","Trans"), # Now this is looking pretty good
row.names = "Variable"))
dat <- unlist2d(l, c("Sector","Trans"), # Id-columns can also be generated as factors
"Variable", id.factor = TRUE)
str(dat)
# Split this sectoral data, first by Variable (Emloyment and Value Added), then by Country
sdat <- rsplit(GGDC10S, ~ Variable + Country, cols = 6:16)
# Compute pairwise correlations between sectors and recombine:
dat <- unlist2d(rapply2d(sdat, pwcor),
idcols = c("Variable","Country"),
row.names = "Sector")
head(dat)
plot(hclust(as.dist(1-pwcor(dat[-(1:3)])))) # Using corrs. as distance metric to cluster sectors
# List of panel-series matrices
psml <- psmat(fsubset(GGDC10S, Variable == "VA"), ~Country, ~Year, cols = 6:16, array = FALSE)
# Recombining with unlist2d() (effectively like reshapig the data)
head(unlist2d(psml, idcols = "Sector", row.names = "Country"))
rm(l, dat, sdat, psml)