rbind_listdf {ecospace} | R Documentation |
Combine Large Lists of Data Frames.
Description
Quickly combine large lists of dataframes into a single data frame by combining them first into smaller data frames. This is only time- and memory-efficient when dealing with large (>100) lists of data frames.
Usage
rbind_listdf(lists = NULL, seq.by = 100)
Arguments
lists |
List in which each component is a data frame. Each data frame must have the same number and types of columns, although the data frames can have different numbers of rows. |
seq.by |
Length of small data frames to work with at a time. Default is 100, which timing tests confirm is generally most efficient. |
Details
Rather than combine all list data frames into a single data frame, this function builds smaller subsets of data frames, and then combines them together at the end. This process is more time- and memory-efficient. Timing tests confirm that seq.by = 100 is the optimal break-point size. See examples for confirmation. Function can break large lists into up to 456,976 data frame subparts, giving a warning if requires more subparts. Only time- and memory-efficient when dealing with large (>100) lists of data frames.
Value
Single data frame with number of rows equal to the sum of all data frames in the list, and columns the same as those of individual list data frames.
Note
When called within lapply
, the simulation functions
neutral
, redundancy
, partitioning
, expansion
,
and calc_metrics
, which calculates common ecological disparity
(functional diversity) statistics on simulated biotas, produce lists of
data frames. This function is useful for combining these separate lists
into a single data frame for subsequent analyses. This is especially useful
when using the functions within a high-performance computing environment
when submitted as 'embarrassingly parallel' implementations.
rbindlist
is a more efficient version of this
function.
See Also
neutral
, redundancy
,
partitioning
, expansion
,
calc_metrics
, and rbindlist
Examples
nl <- 500 # List length
lists <- vector("list", length = nl)
for(i in 1:nl) lists[[i]] <- list(x = rnorm(100), y = rnorm(100))
str(lists)
object.size(lists)
all <- rbind_listdf(lists)
str(all)
object.size(all) # Also smaller object size
# Build blank ecospace framework to use in simulations
ecospace <- create_ecospace(nchar = 15, char.state = rep(3, 15),
char.type = rep("numeric", 15))
# Build 5 samples for neutral model:
nreps <- 1:5
Smax <- 10
n.samples <- lapply(X = nreps, FUN = neutral, Sseed = 3, Smax = Smax, ecospace)
# Calculate functional diversity metrics for simulated samples
n.metrics <- lapply(X = nreps, FUN = calc_metrics, samples = n.samples,
Model = "neutral", Param = "NA")
alarm()
str(n.metrics)
# rbind lists together into a single dataframe
all <- rbind_listdf(n.metrics)
# Calculate mean dynamics
means <- n.metrics[[1]]
for(n in 1:Smax) means[n,4:11] <- apply(all[which(all$S == means$S[n]),4:11],
2, mean, na.rm = TRUE)
means
# Plot statistics as function of species richness, overlaying mean dynamics
op <- par()
par(mfrow = c(2,4), mar = c(4, 4, 1, .3))
attach(all)
plot(S, H, type = "p", cex = .75, col = "gray")
lines(means$S, means$H, type = "l", lwd = 2)
plot(S, D, type = "p", cex = .75, col = "gray")
lines(means$S, means$D, type = "l", lwd = 2)
plot(S, M, type = "p", cex = .75, col = "gray")
lines(means$S, means$M, type = "l", lwd = 2)
plot(S, V, type = "p", cex = .75, col = "gray")
lines(means$S, means$V, type = "l", lwd = 2)
plot(S, FRic, type = "p", cex = .75, col = "gray")
lines(means$S, means$FRic, type = "l", lwd = 2)
plot(S, FEve, type = "p", cex = .75, col = "gray")
lines(means$S, means$FEve, type = "l", lwd = 2)
plot(S, FDiv, type = "p", cex = .75, col = "gray")
lines(means$S, means$FDiv, type = "l", lwd = 2)
plot(S, FDis, type = "p", cex = .75, col = "gray")
lines(means$S, means$FDis, type = "l", lwd = 2)
par(op)
## Not run:
# Note each of following can take a few seconds to run
# Compare timings:
t0 <- Sys.time()
all <- rbind_listdf(lists)
(Sys.time() - t0)
t0 <- Sys.time()
all <- rbind_listdf(lists, seq.by = 20)
(Sys.time() - t0)
t0 <- Sys.time()
all <- rbind_listdf(lists, seq.by = 500)
(Sys.time() - t0)
# Compare to non-function version
all2 <- data.frame()
t0 <- Sys.time()
for(i in 1:nl) all2 <- rbind(all2, lists[[i]])
(Sys.time() - t0)
## End(Not run)
# Compare to data.table's 'rbindlist' version
library(data.table)
t0 <- Sys.time()
all <- rbindlist(lists)
(Sys.time() - t0)