factorize_par {Colossus}R Documentation

Splits a parameter into factors in parallel

Description

factorize_par uses user provided list of columns to define new parameter for each unique value and update the data.table. Not for interaction terms

Usage

factorize_par(
  df,
  col_list,
  verbose = FALSE,
  nthreads = as.numeric(detectCores())
)

Arguments

df

a data.table containing the columns of interest

col_list

an array of column names that should have factor terms defined

verbose

boolean to control if additional information is printed to the console, also accepts 0/1 integer

nthreads

number of threads to use, do not use more threads than available on your machine

Value

returns a list with two named fields. df for the updated dataframe, and cols for the new column names

See Also

Other Data Cleaning Functions: Check_Dupe_Columns(), Check_Trunc(), Correct_Formula_Order(), Date_Shift(), Def_Control(), Def_Control_Guess(), Def_model_control(), Def_modelform_fix(), Joint_Multiple_Events(), Replace_Missing(), Time_Since(), factorize(), gen_time_dep(), interact_them()

Examples

library(data.table)
a <- c(0,1,2,3,4,5,6)
b <- c(1,2,3,4,5,6,7)
c <- c(0,1,2,1,0,1,0)
df <- data.table::data.table("a"=a,"b"=b,"c"=c)
col_list <- c("c")
val <- factorize_par(df,col_list,FALSE,2)
df <- val$df
new_col <- val$cols


[Package Colossus version 1.1.1 Index]