listTopCorrelatedVariables {FRESA.CAD} | R Documentation |
List the variables that are highly correlated with each other
Description
This function computes the Pearson, Spearman, or Kendall correlation for each specified variable in the data set and returns a list of the variables that are correlated to them. It also provides a short variable list without the highly correlated variables.
Usage
listTopCorrelatedVariables(variableList,
data,
pvalue = 0.001,
corthreshold = 0.9,
method = c("pearson", "kendall", "spearman"))
Arguments
variableList |
A data frame with two columns. The first one must have the names of the candidate variables and the other one the description of such variables |
data |
A data frame where all variables are stored in different columns |
pvalue |
The maximum p-value, associated to |
corthreshold |
The minimum correlation score, associated to |
method |
Correlation method: Pearson product-moment ("pearson"), Spearman's rank ("spearman"), or Kendall rank ("kendall") |
Value
correlated.variables |
A data frame with two columns:
|
short.list |
A vector with a list of variables that are not correlated to each other. For every correlated pair, only the variable that first entered the correlation analysis was kept |
Author(s)
Jose G. Tamez-Pena and Antonio Martinez-Torteya
Examples
## Not run:
# Start the graphics device driver to save all plots in a pdf format
pdf(file = "Example.pdf")
# Get the stage C prostate cancer data from the rpart package
library(rpart)
data(stagec)
# Split the stages into several columns
dataCancer <- cbind(stagec[,c(1:3,5:6)],
gleason4 = 1*(stagec[,7] == 4),
gleason5 = 1*(stagec[,7] == 5),
gleason6 = 1*(stagec[,7] == 6),
gleason7 = 1*(stagec[,7] == 7),
gleason8 = 1*(stagec[,7] == 8),
gleason910 = 1*(stagec[,7] >= 9),
eet = 1*(stagec[,4] == 2),
diploid = 1*(stagec[,8] == "diploid"),
tetraploid = 1*(stagec[,8] == "tetraploid"),
notAneuploid = 1-1*(stagec[,8] == "aneuploid"))
# Remove the incomplete cases
dataCancer <- dataCancer[complete.cases(dataCancer),]
# Load a pre-stablished data frame with the names and descriptions of all variables
data(cancerVarNames)
# Get the variables that have a correlation coefficient larger
# than 0.65 at a p-value of 0.05
cor <- listTopCorrelatedVariables(variableList = cancerVarNames,
data = dataCancer,
pvalue = 0.05,
corthreshold = 0.65,
method = "pearson")
# Shut down the graphics device driver
dev.off()
## End(Not run)