treatment_corr {alookr} | R Documentation |
The treatment_corr() diagnose pairs of highly correlated variables or remove on of them.
treatment_corr(.data, corr_thres = 0.8, treat = TRUE, verbose = TRUE)
.data |
a data.frame or a |
corr_thres |
numeric. Set a threshold to detecting variables when correlation greater then threshold. |
treat |
logical. Set whether to removing variables |
verbose |
logical. Set whether to echo information to the console at runtime. |
The correlation coefficient of pearson is obtained for continuous variables and the correlation coefficient of spearman for categorical variables.
An object of data.frame or train_df. and return value is an object of the same type as the .data argument. However, several variables can be excluded by correlation between variables.
# numerical variable
x1 <- 1:100
set.seed(12L)
x2 <- sample(1:3, size = 100, replace = TRUE) * x1 + rnorm(1)
set.seed(1234L)
x3 <- sample(1:2, size = 100, replace = TRUE) * x1 + rnorm(1)
# categorical variable
x4 <- factor(rep(letters[1:20], time = 5))
set.seed(100L)
x5 <- factor(rep(letters[1:20 + sample(1:6, size = 20, replace = TRUE)], time = 5))
set.seed(200L)
x6 <- factor(rep(letters[1:20 + sample(1:3, size = 20, replace = TRUE)], time = 5))
set.seed(300L)
x7 <- factor(sample(letters[1:5], size = 100, replace = TRUE))
exam <- data.frame(x1, x2, x3, x4, x5, x6, x7)
str(exam)
head(exam)
# default case
treatment_corr(exam)
# not removing variables
treatment_corr(exam, treat = FALSE)
# Set a threshold to detecting variables when correlation greater then 0.9
treatment_corr(exam, corr_thres = 0.9, treat = FALSE)
# not verbose mode
treatment_corr(exam, verbose = FALSE)