correlation_threshold {cytominer} | R Documentation |
Remove redundant variables.
Description
correlation_threshold
returns list of variables such that no two variables have a correlation greater than a specified threshold.
Usage
correlation_threshold(variables, sample, cutoff = 0.9, method = "pearson")
Arguments
variables |
character vector specifying observation variables. |
sample |
tbl containing sample used to estimate parameters. |
cutoff |
threshold between [0,1] that defines the minimum correlation of a selected feature. |
method |
optional character string specifying method for calculating correlation. This must be one of the strings |
Details
correlation_threshold
is a wrapper for caret::findCorrelation
.
Value
character vector specifying observation variables to be excluded.
Examples
suppressMessages(suppressWarnings(library(magrittr)))
sample <- tibble::tibble(
x = rnorm(30),
y = rnorm(30) / 1000
)
sample %<>% dplyr::mutate(z = x + rnorm(30) / 10)
variables <- c("x", "y", "z")
head(sample)
cor(sample)
# `x` and `z` are highly correlated; one of them will be removed
correlation_threshold(variables, sample)
[Package cytominer version 0.2.2 Index]