stepMIV {PDtoolkit} | R Documentation |
Stepwise logistic regression based on marginal information value (MIV)
Description
stepMIV
performs stepwise logistic regression based on MIV.
Usage
stepMIV(
start.model,
miv.threshold,
m.ch.p.val,
coding,
coding.start.model = FALSE,
db,
offset.vals = NULL
)
Arguments
start.model |
Formula class that represent starting model. It can include some risk factors, but it can be
defined only with intercept ( |
miv.threshold |
MIV entrance threshold. Only the risk factors with MIV higher than the threshold are candidate for the new model. Additional criteria is that MIV value should significantly separate good from bad cases measured by marginal chi-square test. |
m.ch.p.val |
Significance level of p-value for marginal chi-square test. This test additionally supports MIV value of candidate risk factor for final decision. |
coding |
Type of risk factor coding within the model. Available options are: |
coding.start.model |
Logical ( |
db |
Modeling data with risk factors and target variable. All risk factors should be categorized as of character type. |
offset.vals |
This can be used to specify an a priori known component to be included in the linear predictor during fitting.
This should be |
Value
The command stepMIV
returns a list of five objects.
The first object (model
), is the final model, an object of class inheriting from "glm"
.
The second object (steps
), is the data frame with risk factors selected at each iteration.
The third object (miv.iter
), is the data frame with iteration details.
The fourth object (warnings
), is the data frame with warnings if any observed.
The warnings refer to the following checks: if risk factor has more than 10 modalities,
if any of the bins (groups) has less than 5% of observations and
if there are problems with WoE calculations.
The final, fifth, object dev.db
object dev.db
returns the model development database.
References
Scallan, G. (2011). Class(ic) Scorecards: Selecting Characteristics and Attributes in Logistic Regression, Edinburgh Credit Scoring Conference.
Examples
suppressMessages(library(PDtoolkit))
data(loans)
##identify numeric risk factors
#num.rf <- sapply(loans, is.numeric)
#num.rf <- names(num.rf)[!names(num.rf)%in%"Creditability" & num.rf]
##discretized numeric risk factors using ndr.bin from monobin package
#loans[, num.rf] <- sapply(num.rf, function(x)
# ndr.bin(x = loans[, x], y = loans[, "Creditability"])[[2]])
#str(loans)
#run stepMIV
rf <- c("Account Balance", "Payment Status of Previous Credit", "Purpose",
"Value Savings/Stocks", "Most valuable available asset", "Foreign Worker")
res <- stepMIV(start.model = Creditability ~ 1,
miv.threshold = 0.02,
m.ch.p.val = 0.05,
coding = "WoE",
coding.start.model = FALSE,
db = loans[, c("Creditability", rf)])
#check output elements
names(res)
#print model warnings
res$warnings
#extract the final model
final.model <- res$model
#print coefficients
summary(final.model)$coefficients
#print steps of stepwise
res$steps
#print head of all iteration details
head(res$miv.iter)
#print warnings
res$warnings
#print head of coded development data
head(res$dev.db)
#calculate AUC
auc.model(predictions = predict(final.model, type = "response", newdata = res$dev.db),
observed = res$dev.db$Creditability)