stepMIV {PDtoolkit}R Documentation

Stepwise logistic regression based on marginal information value (MIV)

Description

stepMIV performs stepwise logistic regression based on MIV.

Usage

stepMIV(
  start.model,
  miv.threshold,
  m.ch.p.val,
  coding,
  coding.start.model = FALSE,
  db,
  offset.vals = NULL
)

Arguments

start.model

Formula class that represent starting model. It can include some risk factors, but it can be defined only with intercept (y ~ 1 where y is target variable).

miv.threshold

MIV entrance threshold. Only the risk factors with MIV higher than the threshold are candidate for the new model. Additional criteria is that MIV value should significantly separate good from bad cases measured by marginal chi-square test.

m.ch.p.val

Significance level of p-value for marginal chi-square test. This test additionally supports MIV value of candidate risk factor for final decision.

coding

Type of risk factor coding within the model. Available options are: "WoE" and "dummy". If "WoE" is selected, then modalities of the risk factors are replaced by WoE values, while for "dummy" option dummies (0/1) will be created for n-1 modalities where n is total number of modalities of analyzed risk factor.

coding.start.model

Logical (TRUE or FALSE), if risk factors from the starting model should be WoE coded. It will have an impact only for WoE coding option. Default value is FALSE.

db

Modeling data with risk factors and target variable. All risk factors should be categorized as of character type.

offset.vals

This can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. Default is NULL.

Value

The command stepMIV returns a list of five objects.
The first object (model), is the final model, an object of class inheriting from "glm".
The second object (steps), is the data frame with risk factors selected at each iteration.
The third object (miv.iter), is the data frame with iteration details.
The fourth object (warnings), is the data frame with warnings if any observed. The warnings refer to the following checks: if risk factor has more than 10 modalities, if any of the bins (groups) has less than 5% of observations and if there are problems with WoE calculations.
The final, fifth, object dev.db object dev.db returns the model development database.

References

Scallan, G. (2011). Class(ic) Scorecards: Selecting Characteristics and Attributes in Logistic Regression, Edinburgh Credit Scoring Conference.

Examples

suppressMessages(library(PDtoolkit))
data(loans)
##identify numeric risk factors
#num.rf <- sapply(loans, is.numeric)
#num.rf <- names(num.rf)[!names(num.rf)%in%"Creditability" & num.rf]
##discretized numeric risk factors using ndr.bin from monobin package
#loans[, num.rf] <- sapply(num.rf, function(x) 
#	ndr.bin(x = loans[, x], y = loans[, "Creditability"])[[2]])
#str(loans)
#run stepMIV
rf <- c("Account Balance", "Payment Status of Previous Credit", "Purpose",
       "Value Savings/Stocks", "Most valuable available asset", "Foreign Worker")
res <- stepMIV(start.model = Creditability ~ 1, 
	   miv.threshold = 0.02, 
	   m.ch.p.val = 0.05,
	   coding = "WoE",
	   coding.start.model = FALSE,
	   db = loans[, c("Creditability", rf)])
#check output elements
names(res)
#print model warnings
res$warnings
#extract the final model
final.model <- res$model
#print coefficients
summary(final.model)$coefficients
#print steps of stepwise
res$steps
#print head of all iteration details
head(res$miv.iter)
#print warnings
res$warnings
#print head of coded development data
head(res$dev.db)
#calculate AUC
auc.model(predictions = predict(final.model, type = "response", newdata = res$dev.db),
    observed = res$dev.db$Creditability)

[Package PDtoolkit version 1.2.0 Index]