correlate_df {popstudy} | R Documentation |
correlate_df
Description
Compute correlations in a data frames.
Usage
correlate_df(data, keep_class = NULL)
Arguments
data |
data.frame. A dataset with the variables to correlate. |
keep_class |
list. A list that contains desire classes for specyfic variables. |
Details
correlate_df
takes data.frame class objects and works only with numeric, factor, and ordered class variables, so a previous data cleaning is needed for optimal results. A variable is considered nominal when it is a factor variable with more than two levels, and it is no ordered. When a numeric variable has only two different values, it is considered a binary variable. Also, when a factor variable has only two levels, it is regarded as a binary variable. The computed correlation will depend on the paired-variables class: Pearson method when both variables are numeric, Kendall correlation with a numeric and an ordinal variable, point-biserial with a numeric and a binary variable, Polychoric correlation with two ordinal variables, Tetrachoric correlation when both are binary, Rank-Biserial when one is ordinal, and the other is binary; and Kruskal's Lambda with one binary and one nominal, or both nominal variables. A Gaussian linear model is fitted to estimate the multiple correlation coefficient in the specific cases of one nominal variable and another numerical or ordered, so the user should take it carefully.
Value
correlate_df
function returns a list with three objects: A data-frame with the correlation matrix and two correlation plots.
Author(s)
Cesar Gamboa-Sanabria
References
Khamis H (2008). “Measures of Association: How to Choose?” Journal of Diagnostic Medical Sonography, 24(3), 155-162. doi:10.1177/8756479308317006.
Examples
df <- data.frame(cont1=rnorm(100),
cont2=rnorm(100),
ordi1=factor(sample(1:5, 100, replace = TRUE), ordered = TRUE),
ordi2=factor(sample(1:7, 100, replace = TRUE), ordered = TRUE),
bin1=rbinom(100, 1, .4),
bin2=rbinom(100, 1, .6),
nomi1=factor(sample(letters[1:8], 100, replace = TRUE)),
nomi2=factor(sample(LETTERS[1:8], 100, replace = TRUE)))
correlate_df(df)