cat2cat {cat2cat}R Documentation

Automatic mapping of a categorical variable in a panel dataset according to a new encoding

Description

This function is built to work for two time points at once. Thus for more periods some recursion will be needed. The prune_c2c might be needed when we have many interactions to limit growing number of replications. This function might seems to be a complex at the first glance though it is built to offer a wide range of applications for complex tasks.

Usage

cat2cat(
  data = list(old = NULL, new = NULL, cat_var = NULL, id_var = NULL, time_var = NULL,
    multiplier_var = NULL, freqs_df = NULL),
  mappings = list(trans = NULL, direction = NULL),
  ml = list(method = NULL, features = NULL, args = NULL)
)

Arguments

data

list with 4, 5, 6 or 7 named fields 'old' 'new' 'cat_var' 'time_var' and optional 'id_var','multiplier_var','freq_df'

mappings

list with 2 named fields 'trans' 'direction'

ml

list with 3 named fields 'method' 'features' 'args'

Details

data args

mappings args

ml args

Without ml section only simple frequencies are assessed. When ml model is broken then weights from simple frequencies are taken. Method knn is recommended for smaller datasets.

Value

named list with 2 fields old an new - 2 data.frames. There will be added additional columns like index_c2c, g_new_c2c, wei_freq_c2c, rep_c2c, wei_(ml method name)_c2c. Additional columns will be informative only for a one data.frame as we always make a changes to one direction.

Examples

data(occup_small)
data(occup)
data(trans)

occup_old <- occup_small[occup_small$year == 2008, ]
occup_new <- occup_small[occup_small$year == 2010, ]

# default only simple frequencies

occup_2 <- cat2cat(
  data = list(old = occup_old, new = occup_new, cat_var = "code", time_var = "year"),
  mappings = list(trans = trans, direction = "forward")
)

# additionaly add probabilities from knn

occup_3 <- cat2cat(
  data = list(old = occup_old, new = occup_new, cat_var = "code", time_var = "year"),
  mappings = list(trans = trans, direction = "forward"),
  ml = list(
    method = "knn",
    features = c("age", "sex", "edu", "exp", "parttime", "salary"),
    args = list(k = 10)
  )
)


[Package cat2cat version 0.2.1 Index]