univariateRankVariables {FRESA.CAD} | R Documentation |
Univariate analysis of features
Description
This function reports the mean and standard deviation for each feature in a model, and ranks them according to a user-specified score. Additionally, it does a Kolmogorov-Smirnov (KS) test on the raw and z-standardized data. It also reports the raw and z-standardized t-test score, the p-value of the Wilcoxon rank-sum test, the integrated discrimination improvement (IDI), the net reclassification improvement (NRI), the net residual improvement (NeRI), and the area under the ROC curve (AUC). Furthermore, it reports the z-value of the variable significance on the fitted model.
Usage
univariateRankVariables(variableList,
formula,
Outcome,
data,
categorizationType = c("Raw",
"Categorical",
"ZCategorical",
"RawZCategorical",
"RawTail",
"RawZTail",
"Tail",
"RawRaw"),
type = c("LOGIT", "LM", "COX"),
rankingTest = c("zIDI",
"zNRI",
"IDI",
"NRI",
"NeRI",
"Ztest",
"AUC",
"CStat",
"Kendall"),
cateGroups = c(0.1, 0.9),
raw.dataFrame = NULL,
description = ".",
uniType = c("Binary","Regression"),
FullAnalysis=TRUE,
acovariates = NULL,
timeOutcome = NULL
)
Arguments
variableList |
A data frame with the candidate variables to be ranked |
formula |
An object of class |
Outcome |
The name of the column in |
data |
A data frame where all variables are stored in different columns |
categorizationType |
How variables will be analyzed: As given in |
type |
Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX") |
rankingTest |
Variables will be ranked based on: The z-score of the IDI ("zIDI"), the z-score of the NRI ("zNRI"), the IDI ("IDI"), the NRI ("NRI"), the NeRI ("NeRI"), the z-score of the model fit ("Ztest"), the AUC ("AUC"), the Somers' rank correlation ("Cstat"), or the Kendall rank correlation ("Kendall") |
cateGroups |
A vector of percentiles to be used for the categorization procedure |
raw.dataFrame |
A data frame similar to |
description |
The name of the column in |
uniType |
Type of univariate analysis: Binary classification ("Binary") or regression ("Regression") |
FullAnalysis |
If FALSE it will only order the features according to its z-statistics of the linear model |
acovariates |
the list of covariates |
timeOutcome |
the name of the Time to event feature |
Details
This function will create valid dummy categorical variables if, and only if, data
has been z-standardized.
The p-values provided in cateGroups
will be converted to its corresponding z-score, which will then be used to create the categories.
If non z-standardized data were to be used, the categorization analysis would return wrong results.
Value
A sorted data frame. In the case of a binary classification analysis, the data frame will have the following columns:
Name |
Name of the raw variable or of the dummy variable if the data has been categorized |
parent |
Name of the raw variable from which the dummy variable was created |
descrip |
Description of the parent variable, as defined in |
cohortMean |
Mean value of the variable |
cohortStd |
Standard deviation of the variable |
cohortKSD |
D statistic of the KS test when comparing a normal distribution and the distribution of the variable |
cohortKSP |
Associated p-value to the |
caseMean |
Mean value of cases (subjects with |
caseStd |
Standard deviation of cases |
caseKSD |
D statistic of the KS test when comparing a normal distribution and the distribution of the variable only for cases |
caseKSP |
Associated p-value to the |
caseZKSD |
D statistic of the KS test when comparing a normal distribution and the distribution of the z-standardized variable only for cases |
caseZKSP |
Associated p-value to the |
controlMean |
Mean value of controls (subjects with |
controlStd |
Standard deviation of controls |
controlKSD |
D statistic of the KS test when comparing a normal distribution and the distribution of the variable only for controls |
controlKSP |
Associated p-value to the |
controlZKSD |
D statistic of the KS test when comparing a normal distribution and the distribution of the z-standardized variable only for controls |
controlZKSP |
Associated p-value to the |
t.Rawvalue |
Normal inverse p-value (z-value) of the t-test performed on |
t.Zvalue |
z-value of the t-test performed on |
wilcox.Zvalue |
z-value of the Wilcoxon rank-sum test performed on |
ZGLM |
z-value returned by the |
zNRI |
z-value returned by the |
zIDI |
z-value returned by the |
zNeRI |
z-value returned by the |
ROCAUC |
Area under the ROC curve returned by the |
cStatCorr |
c index of Somers' rank correlation returned by the |
NRI |
NRI returned by the |
IDI |
IDI returned by the |
NeRI |
NeRI returned by the |
kendall.r |
Kendall |
kendall.p |
Associated p-value to the |
TstudentRes.p |
p-value of the improvement in residuals, as evaluated by the paired t-test |
WilcoxRes.p |
p-value of the improvement in residuals, as evaluated by the paired Wilcoxon rank-sum test |
FRes.p |
p-value of the improvement in residual variance, as evaluated by the F-test |
caseN_Z_Low_Tail |
Number of cases in the low tail |
caseN_Z_Hi_Tail |
Number of cases in the top tail |
controlN_Z_Low_Tail |
Number of controls in the low tail |
controlN_Z_Hi_Tail |
Number of controls in the top tail |
In the case of regression analysis, the data frame will have the following columns:
Name |
Name of the raw variable or of the dummy variable if the data has been categorized |
parent |
Name of the raw variable from which the dummy variable was created |
descrip |
Description of the parent variable, as defined in |
cohortMean |
Mean value of the variable |
cohortStd |
Standard deviation of the variable |
cohortKSD |
D statistic of the KS test when comparing a normal distribution and the distribution of the variable |
cohortKSP |
Associated p-value to the |
cohortZKSD |
D statistic of the KS test when comparing a normal distribution and the distribution of the z-standardized variable |
cohortZKSP |
Associated p-value to the |
ZGLM |
z-value returned by the glm or Cox procedure for the z-standardized variable |
zNRI |
z-value returned by the |
NeRI |
NeRI returned by the |
cStatCorr |
c index of Somers' rank correlation returned by the |
spearman.r |
Spearman |
pearson.r |
Pearson r product-moment correlation coefficient between the variable and the outcome |
kendall.r |
Kendall |
kendall.p |
Associated p-value to the |
TstudentRes.p |
p-value of the improvement in residuals, as evaluated by the paired t-test |
WilcoxRes.p |
p-value of the improvement in residuals, as evaluated by the paired Wilcoxon rank-sum test |
FRes.p |
p-value of the improvement in residual variance, as evaluated by the F-test |
Author(s)
Jose G. Tamez-Pena
References
Pencina, M. J., D'Agostino, R. B., & Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Statistics in medicine 27(2), 157-172.