aggregate_by_key {dataPreparation}R Documentation

Automatic data_set aggregation by key

Description

Automatic aggregation of a data_set set according to a key.

Usage

aggregate_by_key(data_set, key, verbose = TRUE, thresh = 53, ...)

Arguments

data_set

Matrix, data.frame or data.table (with only numeric, integer, factor, logical, character columns)

key

Name of a column of data_set according to which the set should be aggregated (character)

verbose

Should the algorithm talk? (logical, default to TRUE)

thresh

Number of max values for frequencies count (numerical, default to 53)

...

Optional argument: functions: aggregation functions for numeric columns (vector of function names (character), optional, if not set we use: c("mean", "min", "max", "sd"))

Details

Perform aggregation depending on column type:

In all cases, if the set as more rows than unique key, a number of lines will be computed.

Be careful using functions argument, given functions should be an aggregation function, meaning that for multiple values it should only return one value.

Value

A data.table with one line per key elements and multiple new columns.

Examples

## Not run: 
# Get generic dataset from R
data("adult")

# Aggregate it using aggregate_by_key, in order to extract characteristics for each country
adult_aggregated <- aggregate_by_key(adult, key = 'country')

# Exmple with other functions
power <- function(x) {sum(x^2)}
adult_aggregated <- aggregate_by_key(adult, key = 'country', functions = c("power", "sqrt"))

# sqrt is not an aggregation function, so it wasn't used.

## End(Not run)
# "##NOT RUN:" mean that this example hasn't been run on CRAN since its long. But you can run it!

[Package dataPreparation version 1.0.4 Index]