ExpCatStat {SmartEDA} | R Documentation |
Function provides summary statistics for all character or categorical columns in the dataframe
Description
This function combines results from weight of evidence, information value and summary statistics.
Usage
ExpCatStat(
data,
Target = NULL,
result = "Stat",
clim = 10,
nlim = 10,
bins = 10,
Pclass = NULL,
plot = FALSE,
top = 20,
Round = 2
)
Arguments
data |
dataframe or matrix |
Target |
target variable |
result |
"Stat" - summary statistics, "IV" - information value |
clim |
maximum unique levles for categorical variable. Variables will be dropped if unique levels is higher than clim for class factor/character variable |
nlim |
maximum unique values for numeric variable. |
bins |
number of bins (default is 10) |
Pclass |
reference category of target variable |
plot |
Information value barplot (default FALSE) |
top |
for plotting top information values (default value is 20) |
Round |
round of value |
Details
Criteria used for categorical variable predictive power classification are
-
If information value is < 0.03
then predictive power = "Not Predictive" -
If information value is 0.3 to 0.1
then predictive power = "Somewhat Predictive" -
If information value is 0.1 to 0.3
then predictive power = "Meidum Predictive" -
If information value is >0.3
then predictive power = "Highly Predictive"
Value
This function provides summary statistics for categorical variable
-
Stat
- Summary statistics includes Chi square test scores, p value, Information values, Cramers V and Degree if association -
IV
- Weight of evidence and Information values
Columns description:
-
Variable
variable name -
Target
- Target variable -
class
- name of bin (variable value otherwise) -
out0
- number of good observations -
out1
- number of bad observations -
Total
- Total values for each category -
pct1
- good observations / total good observations -
pct0
- bad observations / total bad observations -
odds
- Odds ratio [(a/b)/(c/d)] -
woe
- Weight of Evidence – calculated as ln(odds) -
iv
- Information Value - ln(odds) * (pct0 – pct1)
Author(s)
dubrangala
Examples
# Example 1
## Read mtcars data
# Target variable "am" - Transmission (0 = automatic, 1 = manual)
# Summary statistics
ExpCatStat(mtcars,Target="am",result = "Stat",clim=10,nlim=10,bins=10,
Pclass=1,plot=FALSE,top=20,Round=2)
# Information value plot
ExpCatStat(mtcars,Target="am",result = "Stat",clim=10,nlim=10,bins=10,
Pclass=1,plot=TRUE,top=20,Round=2)
# Information value for categorical Independent variables
ExpCatStat(mtcars,Target="am",result = "IV",clim=10,nlim=10,bins=10,
Pclass=1,plot=FALSE,top=20,Round=2)