create.partitions {PDtoolkit}R Documentation

Create partitions (aka nested dummy variables)

Description

create.partitions performs creation of partitions (aka nested dummy variables). Using directly into logistic regression, partitions provide insight into difference of log-odds of adjacent risk factor bins (groups). Adjacent bins are selected based on alphabetic order of analyzed risk factor modalities, therefore it is important to ensure that modality labels are defined in line with expected monotonicity or any other criterion that is considered while engineering the risk factors.

Usage

create.partitions(db)

Arguments

db

Data set of risk factors to be converted into partitions.

Value

The command create.partitions returns a list of two objects (data frames).
The first object (partitions), returns the data set with newly created nested dummy variables.
The second object (info), is the data frame that returns info on partition process. Set of quality checks are performed and reported if any of them observed. Two of them are of terminal nature i.e. if observed, risk factor is not processed further (less then two non-missing groups and more than 10 modalities) while the one provides only info (warning) as usually deviates from the main principles of risk factor processing (less than 5% of observations per bin).

References

Scallan, G. (2011). Class(ic) Scorecards: Selecting Characteristics and Attributes in Logistic Regression, Edinburgh Credit Scoring Conference.

Examples

suppressMessages(library(PDtoolkit))
data(loans)
#identify numeric risk factors
num.rf <- sapply(loans, is.numeric)
num.rf <- names(num.rf)[!names(num.rf)%in%"Creditability" & num.rf]
#discretized numeric risk factors using ndr.bin from monobin package
loans[, num.rf] <- sapply(num.rf, function(x) 
cum.bin(x = loans[, x], y = loans[, "Creditability"])[[2]])
str(loans)
loans.p <- create.partitions(db = loans[, num.rf])
head(loans.p[["partitions"]])
loans.p[["info"]]
#bring target to partitions
db.p <- cbind.data.frame(Creditability = loans$Creditability, loans.p[[1]])
#prepare risk factors for stepMIV 
db.p[, -1] <- sapply(db.p[, -1], as.character)
#run stepMIV
res <- stepMIV(start.model = Creditability ~ 1, 
   miv.threshold = 0.02, 
   m.ch.p.val = 0.05,
   coding = "dummy",
   db = db.p)
#check output elements
names(res)
#extract the final model
final.model <- res$model
#print coefficients
summary(final.model)$coefficients

[Package PDtoolkit version 1.2.0 Index]