cat2cat_agg {cat2cat} | R Documentation |
Manual mapping for an aggregated panel dataset
Description
Manual mapping of an inconsistently coded categorical variable according to the user provided mappings (equations).
Usage
cat2cat_agg(
data = list(old = NULL, new = NULL, cat_var_old = NULL, cat_var_new = NULL, time_var =
NULL, freq_var = NULL),
...
)
Arguments
data |
list with 5 named fields 'old', 'new', 'cat_var', 'time_var', 'freq_var'. |
... |
mapping equations where direction is set with any of, '>', '<', '%>%', '%<%'. |
Details
data argument - list with fields
- "old"
data.frame older time point in the panel
- "new"
data.frame more recent time point in the panel
- "cat_var"
-
character - deprecated - name of the categorical variable
- "cat_var_old"
-
character name of the categorical variable in the old period
- "cat_var_new"
-
character name of the categorical variable in the new period
- "time_var"
character name of time variable
- "freq_var"
character name of frequency variable
Value
'named list' with 2 fields old and new - 2 data.frames. There will be added additional columns to each. The new columns are added instead of the additional metadata as we are working with new datasets where observations could be replicated. For the transparency the probability and number of replications are part of each observation in the 'data.frame'.
Note
All mapping equations have to be valid ones.
Examples
data("verticals", package = "cat2cat")
agg_old <- verticals[verticals$v_date == "2020-04-01", ]
agg_new <- verticals[verticals$v_date == "2020-05-01", ]
# cat2cat_agg - can map in both directions at once
# although usually we want to have the old or the new representation
agg <- cat2cat_agg(
data = list(
old = agg_old,
new = agg_new,
cat_var_old = "vertical",
cat_var_new = "vertical",
time_var = "v_date",
freq_var = "counts"
),
Automotive %<% c(Automotive1, Automotive2),
c(Kids1, Kids2) %>% c(Kids),
Home %>% c(Home, Supermarket)
)
## possible processing
library("dplyr")
agg %>%
bind_rows() %>%
group_by(v_date, vertical) %>%
summarise(
sales = sum(sales * prop_c2c),
counts = sum(counts * prop_c2c),
v_date = first(v_date)
)