calculate_cors {corrgrapher} | R Documentation |
Calculate correlation coefficients
Description
Calculate correlation coefficients between variables in a data.frame
, matrix
or table
using 3 different functions for 3 different possible pairs of vairables:
numeric - numeric
numeric - categorical
categorical - categorical
Usage
calculate_cors(
x,
num_num_f = NULL,
num_cat_f = NULL,
cat_cat_f = NULL,
max_cor = NULL
)
## S3 method for class 'explainer'
calculate_cors(
x,
num_num_f = NULL,
num_cat_f = NULL,
cat_cat_f = NULL,
max_cor = NULL
)
## S3 method for class 'matrix'
calculate_cors(
x,
num_num_f = NULL,
num_cat_f = NULL,
cat_cat_f = NULL,
max_cor = NULL
)
## S3 method for class 'table'
calculate_cors(
x,
num_num_f = NULL,
num_cat_f = NULL,
cat_cat_f = NULL,
max_cor = NULL
)
## Default S3 method:
calculate_cors(
x,
num_num_f = NULL,
num_cat_f = NULL,
cat_cat_f = NULL,
max_cor = NULL
)
Arguments
x |
object used to select method. See more below. |
num_num_f |
A |
num_cat_f |
A |
cat_cat_f |
A |
max_cor |
A number used to indicate absolute correlation (like 1 in |
Value
A symmetrical matrix
A of size n x n, where n - amount of columns in x
(or dimensions for table
).
The value at A(i,j) is the correlation coefficient between ith and jth variable.
On the diagonal, values from max_cor
are set.
X argument
When x
is a data.frame
, all columns of numeric
type are treated as numeric variables and all columns of factor
type are treated as categorical variables. Columns of other types are ignored.
When x
is a matrix
, it is converted to data.frame
using as.data.frame.matrix
.
When x
is a explainer
, the tests are performed on its data
element.
When x
is a table
, it is treated as contingency table. Its dimensions must be named, but none of them may be named Frequency
.
Default functions
By default, the function calculates p_value of statistical tests ( cor.test
for 2 numeric
, chisq.test
for factor
and kruskal.test
for mixed).
Then, the correlation coefficients are calculated as -log10(p_value)
. Any results above 100 are treated as absolute correlation and cut to 100.
The results are then divided by 100 to fit inside [0,1].
If only numeric
data was supplied, the function used is cor.test
.
Custom functions
Creating consistent measures for correlation coefficients, which are comparable for different kinds of variables, is a non-trivial task. Therefore, if user wishes to use custom function for calculating correlation coefficients, he must provide all necessary functions. Using a custom function for one case and a default for the other is consciously not supported. Naturally, user may supply copies of default functions at his own responsibility.
Function calculate_cors
chooses, which parameters of *_f
are required based on data supported.
For example, for a matrix
with numeric
data only num_num_f
is required.
On the other hand, for a table
only cat_cat_f
is required.
All *_f
parameters must be functions, which accept 2 parameters (numeric
or factor
vectors respectively)
and return a single number from [0,max_num]. The num_cat_f
must accept numeric
argument as first and factor
argument as second.
See Also
cor.test
, chisq.test
, kruskal.test
Examples
data(mtcars)
# Make sure, that categorical variables are factors
mtcars$vs <- factor(mtcars$vs, labels = c('V-shaped', 'straight'))
mtcars$am <- factor(mtcars$am, labels = c('automatic', 'manual'))
calculate_cors(mtcars)
# For a table:
data(HairEyeColor)
calculate_cors(HairEyeColor)
# Custom functions:
num_mtcars <- mtcars[,-which(colnames(mtcars) %in% c('vs', 'am'))]
my_f <- function(x,y) cor.test(x, y, method = 'spearman', exact=FALSE)$estimate
calculate_cors(num_mtcars, num_num_f = my_f, max_cor = 1)