create.partitions {PDtoolkit} | R Documentation |
Create partitions (aka nested dummy variables)
Description
create.partitions
performs creation of partitions (aka nested dummy variables).
Using directly into logistic regression, partitions provide insight into difference of log-odds of adjacent risk factor bins (groups).
Adjacent bins are selected based on alphabetic order of analyzed risk factor modalities, therefore it is important
to ensure that modality labels are defined in line with expected monotonicity or any other criterion
that is considered while engineering the risk factors.
Usage
create.partitions(db)
Arguments
db |
Data set of risk factors to be converted into partitions. |
Value
The command create.partitions
returns a list of two objects (data frames).
The first object (partitions
), returns the data set with newly created nested dummy variables.
The second object (info
), is the data frame that returns info on partition process.
Set of quality checks are performed and reported if any of them observed. Two of them are of terminal nature
i.e. if observed, risk factor is not processed further (less then two non-missing groups and more than 10 modalities)
while the one provides only info (warning) as usually deviates from the main principles of risk factor processing
(less than 5% of observations per bin).
References
Scallan, G. (2011). Class(ic) Scorecards: Selecting Characteristics and Attributes in Logistic Regression, Edinburgh Credit Scoring Conference.
Examples
suppressMessages(library(PDtoolkit))
data(loans)
#identify numeric risk factors
num.rf <- sapply(loans, is.numeric)
num.rf <- names(num.rf)[!names(num.rf)%in%"Creditability" & num.rf]
#discretized numeric risk factors using ndr.bin from monobin package
loans[, num.rf] <- sapply(num.rf, function(x)
cum.bin(x = loans[, x], y = loans[, "Creditability"])[[2]])
str(loans)
loans.p <- create.partitions(db = loans[, num.rf])
head(loans.p[["partitions"]])
loans.p[["info"]]
#bring target to partitions
db.p <- cbind.data.frame(Creditability = loans$Creditability, loans.p[[1]])
#prepare risk factors for stepMIV
db.p[, -1] <- sapply(db.p[, -1], as.character)
#run stepMIV
res <- stepMIV(start.model = Creditability ~ 1,
miv.threshold = 0.02,
m.ch.p.val = 0.05,
coding = "dummy",
db = db.p)
#check output elements
names(res)
#extract the final model
final.model <- res$model
#print coefficients
summary(final.model)$coefficients