pps {dlookr} | R Documentation |
Compute Predictive Power Score
Description
The pps() compute PPS(Predictive Power Score) for exploratory data analysis.
Usage
pps(.data, ...)
## S3 method for class 'data.frame'
pps(.data, ..., cv_folds = 5, do_parallel = FALSE, n_cores = -1)
## S3 method for class 'target_df'
pps(.data, ..., cv_folds = 5, do_parallel = FALSE, n_cores = -1)
Arguments
.data |
a target_df or data.frame. |
... |
one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. If the first expression is negative, describe() will automatically start with all variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing. |
cv_folds |
integer. number of cross-validation folds. |
do_parallel |
logical. whether to perform score calls in parallel. |
n_cores |
integer. number of cores to use, defaults to maximum cores - 1. |
Details
The PPS is an asymmetric, data-type-agnostic score that can detect linear or non-linear relationships between two variables. The score ranges from 0 (no predictive power) to 1 (perfect predictive power).
Value
An object of the class as pps. Attributes of pps class is as follows.
type : type of pps
target : name of target variable
predictor : name of predictor
Information of Predictive Power Score
The information of PPS is as follows.
x : the name of the predictor variable
y : the name of the target variable
result_type : text showing how to interpret the resulting score
pps : the predictive power score
metric : the evaluation metric used to compute the PPS
baseline_score : the score of a naive model on the evaluation metric
model_score : the score of the predictive model on the evaluation metric
cv_folds : how many cross-validation folds were used
seed : the seed that was set
algorithm : text shwoing what algorithm was used
model_type : text showing whether classification or regression was used
References
RIP correlation. Introducing the Predictive Power Score - by Florian Wetschoreck
https://towardsdatascience.com/rip-correlation-introducing-the-predictive-power-score-3d90808b9598
See Also
Examples
library(dplyr)
# If you want to use this feature, you need to install the 'ppsr' package.
if (!requireNamespace("ppsr", quietly = TRUE)) {
cat("If you want to use this feature, you need to install the 'ppsr' package.\n")
}
# pps type is generic =======================================
pps_generic <- pps(iris)
pps_generic
# pps type is target_by =====================================
##-----------------------------------------------------------
# If the target variable is a categorical variable
categ <- target_by(iris, Species)
# compute all variables
pps_cat <- pps(categ)
pps_cat
# compute Petal.Length and Petal.Width variable
pps_cat <- pps(categ, Petal.Length, Petal.Width)
pps_cat
# Using dplyr
pps_cat <- iris %>%
target_by(Species) %>%
pps()
pps_cat
##-----------------------------------------------------------
# If the target variable is a numerical variable
num <- target_by(iris, Petal.Length)
pps_num <- pps(num)
pps_num