uniRankVar {FRESA.CAD} | R Documentation |
Univariate analysis of features (additional values returned)
Description
This function reports the mean and standard deviation for each feature in a model, and ranks them according to a user-specified score.
Additionally, it does a Kolmogorov-Smirnov (KS) test on the raw and z-standardized data.
It also reports the raw and z-standardized t-test score, the p-value of the Wilcoxon rank-sum test, the integrated discrimination improvement (IDI), the net reclassification improvement (NRI), the net residual improvement (NeRI), and the area under the ROC curve (AUC).
Furthermore, it reports the z-value of the variable significance on the fitted model.
Besides reporting an ordered data frame, this function returns all arguments as values, so that the results can be updates with the update.uniRankVar
if needed.
Usage
uniRankVar(variableList,
formula,
Outcome,
data,
categorizationType = c("Raw",
"Categorical",
"ZCategorical",
"RawZCategorical",
"RawTail",
"RawZTail",
"Tail",
"RawRaw"),
type = c("LOGIT", "LM", "COX"),
rankingTest = c("zIDI",
"zNRI",
"IDI",
"NRI",
"NeRI",
"Ztest",
"AUC",
"CStat",
"Kendall"),
cateGroups = c(0.1, 0.9),
raw.dataFrame = NULL,
testData = NULL,
description = ".",
uniType = c("Binary", "Regression"),
FullAnalysis=TRUE,
acovariates = NULL,
timeOutcome = NULL)
Arguments
variableList |
A data frame with two columns. The first one must have the names of the candidate variables and the other one the description of such variables |
formula |
An object of class |
Outcome |
The name of the column in |
data |
A data frame where all variables are stored in different columns |
categorizationType |
How variables will be analyzed : As given in |
type |
Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX") |
rankingTest |
Variables will be ranked based on: The z-score of the IDI ("zIDI"), the z-score of the NRI ("zNRI"), the IDI ("IDI"), the NRI ("NRI"), the NeRI ("NeRI"), the z-score of the model fit ("Ztest"), the AUC ("AUC"), the Somers' rank correlation ("Cstat"), or the Kendall rank correlation ("Kendall") |
cateGroups |
A vector of percentiles to be used for the categorization procedure |
raw.dataFrame |
A data frame similar to |
testData |
A data frame for model testing |
description |
The name of the column in |
uniType |
Type of univariate analysis: Binary classification ("Binary") or regression ("Regression") |
FullAnalysis |
If FALSE it will only order the features according to its z-statistics of the linear model |
acovariates |
the list of covariates |
timeOutcome |
the name of the Time to event feature |
Details
This function will create valid dummy categorical variables if, and only if, data
has been z-standardized.
The p-values provided in cateGroups
will be converted to its corresponding z-score, which will then be used to create the categories.
If non z-standardized data were to be used, the categorization analysis would return wrong results.
Value
orderframe |
A sorted list of model variables stored in a data frame |
variableList |
The argument |
formula |
The argument |
Outcome |
The argument |
data |
The argument |
categorizationType |
The argument |
type |
The argument |
rankingTest |
The argument |
cateGroups |
The argument |
raw.dataFrame |
The argument |
description |
The argument |
uniType |
The argument |
Author(s)
Jose G. Tamez-Pena and Antonio Martinez-Torteya
References
Pencina, M. J., D'Agostino, R. B., & Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Statistics in medicine 27(2), 157-172.
See Also
update.uniRankVar,
univariateRankVariables