consolidate {manydata} | R Documentation |
Consolidate datacube into a single dataset
Description
This function consolidates a set of datasets in a 'many* package' datacube into a single dataset with some combination of the rows, columns, and observations of the datasets in the datacube. The function includes separate arguments for the rows and columns, as well as for how to resolve conflicts for observations across datasets. This provides users with considerable flexibility in how they combine data. For example, users may wish to stick to units that appear in every dataset but include variables coded in any dataset, or units that appear in any dataset but only those variables that appear in every dataset. Even then there may be conflicts, as the actual unit-variable observations may differ from dataset to dataset. We offer a number of resolve methods that enable users to choose how conflicts between observations are resolved.
Usage
consolidate(
datacube,
rows = "any",
cols = "any",
resolve = "coalesce",
key = "manyID"
)
Arguments
datacube |
A datacube from one of the many packages |
rows |
Which rows or units to retain. By default "any" (or all) units are retained, but another option is "every", which retains only those units that appear in all parent datasets. |
cols |
Which columns or variables to retain. By default "any" (or all) variables are retained, but another option is "every", which retains only those variables that appear in all parent datasets. |
resolve |
How should conflicts between observations be resolved?
By default "coalesce",
but other options include: "min", "max", "mean", "median", and "random".
"coalesce" takes the first non-NA value.
"max" takes the largest value.
"min" takes the smallest value.
"mean" takes the average value.
"median" takes the median value.
"random" takes a random value.
For different variables to be resolved differently,
you can specify the variables' names alongside
how each is to be resolved in a list
(e.g. |
key |
An ID column to collapse by.
By default "manyID".
Users can also specify multiple key variables in a list.
For multiple key variables, the key variables must be present in
all the datasets in the datacube (e.g. |
Details
Text variables are dropped for more efficient consolidation.
Value
A single tibble/data frame.
Examples
consolidate(datacube = emperors, key = "ID")
consolidate(datacube = favour(emperors, "UNRV"), rows = "every",
cols = "every", resolve = "coalesce", key = "ID")
consolidate(datacube = emperors, rows = "any", cols = "every",
resolve = "min", key = "ID")
consolidate(datacube = emperors, rows = "every", cols = "any",
resolve = "max", key = "ID")
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = "median", key = "ID")
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = "mean", key = "ID")
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = "random", key = "ID")
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = c(Begin = "min", End = "max"), key = "ID")
consolidate(datacube = emperors, rows = "any", cols = "any",
resolve = c(Death = "max", Cause = "coalesce"),
key = c("ID", "Begin"))