aggregate_by_key {dataPreparation} | R Documentation |
Automatic data_set aggregation by key
Description
Automatic aggregation of a data_set set according to a key
.
Usage
aggregate_by_key(data_set, key, verbose = TRUE, thresh = 53, ...)
Arguments
data_set |
Matrix, data.frame or data.table (with only numeric, integer, factor, logical, character columns) |
key |
Name of a column of data_set according to which the set should be aggregated (character) |
verbose |
Should the algorithm talk? (logical, default to TRUE) |
thresh |
Number of max values for frequencies count (numerical, default to 53) |
... |
Optional argument: |
Details
Perform aggregation depending on column type:
If column is numeric
functions
are performed on the column. So 1 numeric column give length(functions) new columns,If column is character or factor and have less than
thresh
different values, frequency count of values is performed,If column is character or factor with more than
thresh
different values, number of different values for eachkey
is performed,If column is logical, number of TRUE is computed.
In all cases, if the set as more rows than unique key
, a number of lines will be computed.
Be careful using functions argument, given functions should be an aggregation function, meaning that for multiple values it should only return one value.
Value
A data.table
with one line per key
elements and multiple new columns.
Examples
## Not run:
# Get generic dataset from R
data("adult")
# Aggregate it using aggregate_by_key, in order to extract characteristics for each country
adult_aggregated <- aggregate_by_key(adult, key = 'country')
# Example with other functions
power <- function(x) {sum(x^2)}
adult_aggregated <- aggregate_by_key(adult, key = 'country', functions = c("power", "sqrt"))
# sqrt is not an aggregation function, so it wasn't used.
## End(Not run)
# "##NOT RUN:" mean that this example hasn't been run on CRAN since its long. But you can run it!