cross_c2c {cat2cat}R Documentation

a function to make a combination of weights from different methods by each row

Description

adding additional column which is a mix of weights columns by each row

Usage

cross_c2c(
  df,
  cols = colnames(df)[grepl("^wei_.*_c2c$", colnames(df))],
  weis = rep(1/length(cols), length(cols)),
  na.rm = TRUE
)

Arguments

df

data.frame

cols

character vector default all columns follow regex like "wei_.*_c2c"

weis

numeric vector Default vector the same length as cols and with equally spaced values summing to 1.

na.rm

logical if NA should be skipped, default TRUE

Value

data.frame with an additional column wei_cross_c2c

Examples

data(occup_small)
data(occup)
data(trans)

occup_old <- occup_small[occup_small$year == 2008, ]
occup_new <- occup_small[occup_small$year == 2010, ]

# mix of methods - forward direction, try out backward too
occup_mix <- cat2cat(
  data = list(old = occup_old, new = occup_new, cat_var = "code", time_var = "year"),
  mappings = list(trans = trans, direction = "forward"),
  ml = list(
    method = c("knn", "rf"),
    features = c("age", "sex", "edu", "exp", "parttime", "salary"),
    args = list(k = 10, ntree = 20)
  )
)
# correlation between ml model
occup_mix_old <- occup_mix$old
cor(occup_mix_old[occup_mix_old$rep_c2c != 1, c("wei_knn_c2c", "wei_rf_c2c", "wei_freq_c2c")])
# cross all methods and subset one highest probability category for each subject
occup_old_highest1_mix <- prune_c2c(cross_c2c(occup_mix$old),
  column = "wei_cross_c2c", method = "highest1"
)

[Package cat2cat version 0.2.1 Index]