bivariate {PDtoolkit}R Documentation

Bivariate analysis

Description

bivariate returns the bivariate statistics for risk factors supplied in data frame db.
Implemented procedure expects all risk factors to be categorical, thus numeric risk factors should be first categorized. Additionally, maximum number of groups per risk factor is set to 10, so risk factors with more than 10 categories will not be processed automatically, but manual inspection can be still done using woe.tbl and auc.model functions in order to produce the same statistics. Results of both checks (risk factor class and number of categories), if identified, will be reported in second element of function output - info data frame.
Bivariate report (first element of function output - results data frame) includes:

Additional info report (second element of function output - info data frame), if produced, includes:

Usage

bivariate(db, target)

Arguments

db

Data frame of risk factors and target variable supplied for bivariate analysis.

target

Name of target variable within db argument.

Value

The command bivariate returns the list of two data frames. The first one contains bivariate metrics while the second data frame reports results of above explained validations (class of the risk factors and number of categories).

See Also

woe.tbl and auc.model for manual bivariate analysis.

Examples

suppressMessages(library(PDtoolkit))
data(gcd)
#categorize numeric risk factors
gcd$age.bin <- ndr.bin(x = gcd$age, y = gcd$qual)[[2]]
gcd$age.bin.1 <- cut(x = gcd$age, breaks = 20)
gcd$maturity.bin <- ndr.bin(x = gcd$maturity, y = gcd$qual, y.type = "bina")[[2]]
gcd$amount.bin <- ndr.bin(x = gcd$amount, y = gcd$qual)[[2]]
str(gcd)
#select target variable and categorized risk factors
gcd.bin <- gcd[, c("qual", "age.bin", "maturity.bin", "amount.bin")]
#run bivariate analysis on data frame with only categorical risk factors
bivariate(db = gcd.bin, target = "qual")
#run bivariate analysis on data frame with mixed risk factors (categorical and numeric). 
#for this example info table is produced
bivariate(db = gcd, target = "qual")
#run woe table for risk factor with more than 10 modalities
woe.tbl(tbl = gcd, x = "age.bin.1", y = "qual")
#calculate auc for risk factor with more than 10 modalities
lr <- glm(qual ~ age.bin.1, family = "binomial", data = gcd)
auc.model(predictions = predict(lr, type = "response", newdata = gcd),
    observed = gcd$qual)

[Package PDtoolkit version 1.2.0 Index]