IC_clean_data {HelpersMG} | R Documentation |
Clean the dataframe before to be used with IC_threshold_matrix
Description
This function must be used if missing values are present in the dataset.
It ensures that all correlations and partial correlations can be calculated.
The columns of the dataframe are removed one per one until all can be calculated without error.
It is possible to say that one or more columns must be retained because they are of particular importance in the analysis.
The use and method parameters are used by cor() function. The function uses by default a parallel computing in Unix or MacOSX systems.
If progress is TRUE and the package pbmcapply is present, a progress bar is displayed. If debug is TRUE, some informations are shown during the process.
https://fr.wikipedia.org/wiki/Iconographie_des_corrélations
Usage
IC_clean_data(
data = stop("A dataframe object is required"),
use = c("pairwise.complete.obs", "everything", "all.obs", "complete.obs",
"na.or.complete"),
method = c("pearson", "kendall", "spearman"),
variable.retain = NULL,
test.partial.correlation = TRUE,
progress = TRUE,
debug = FALSE
)
Arguments
data |
The data.frame to be cleaned |
use |
an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". |
method |
a character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated. |
variable.retain |
a vector with the name of columns to keep |
test.partial.correlation |
should the partial correlations be tested ? |
progress |
Show a progress bar |
debug |
if TRUE, information about progression of cleaning are shown |
Details
IC_clean_data checks and corrects the dataframe to be used with IC_threshold_matrix
Value
A dataframe
Author(s)
Marc Girondot marc.girondot@gmail.com
References
Lesty, M., 1999. Une nouvelle approche dans le choix des régresseurs de la régression multiple en présence d’interactions et de colinéarités. Revue de Modulad 22, 41-77.
See Also
Other Iconography of correlations:
IC_correlation_simplify()
,
IC_threshold_matrix()
,
plot.IconoCorel()
Examples
## Not run:
library("HelpersMG")
# based on https://fr.wikipedia.org/wiki/Iconographie_des_corrélations
es <- structure(list(Student = c("e1", "e2", "e3", "e4", "e5", "e6", "e7", "e8"),
Mass = c(52, 59, 55, 58, 66, 62, 63, 69),
Age = c(12, 12.5, 13, 14.5, 15.5, 16, 17, 18),
Assiduity = c(12, 9, 15, 5, 11, 15, 12, 9),
Note = c(5, 5, 9, 5, 13.5, 18, 18, 18)),
row.names = c(NA, -8L), class = "data.frame")
es
df_clean <- IC_clean_data(es, debug = TRUE)
cor_matrix <- IC_threshold_matrix(data=df_clean, threshold = NULL, progress=FALSE)
cor_threshold <- IC_threshold_matrix(data=df_clean, threshold = 0.3)
plot(cor_threshold, show.legend.strength=FALSE, show.legend.direction = FALSE)
cor_threshold_Note <- IC_correlation_simplify(matrix=cor_threshold, variable="Note")
plot(cor_threshold_Note, show.legend.strength=FALSE, show.legend.direction = FALSE)
cor_threshold <- IC_threshold_matrix(data=df_clean, threshold = 0.6)
plot(cor_threshold,
layout=matrix(data=c(53, 53, 55, 55,
55, 53, 55, 53), ncol=2, byrow=FALSE),
show.legend.direction = FALSE,
show.legend.strength = FALSE, xlim=c(-2, 2), ylim=c(-2, 2))
## End(Not run)