embedded.blocks {PDtoolkit}R Documentation

Embedded blocks regression

Description

embedded.blocks performs blockwise regression where the predictions of each blocks' model is used as an risk factor for the model of the following block.

Usage

embedded.blocks(
  method,
  target,
  db,
  coding = "WoE",
  blocks,
  p.value = 0.05,
  miv.threshold = 0.02,
  m.ch.p.val = 0.05
)

Arguments

method

Regression method applied on each block. Available methods: "stepMIV", "stepFWD", "stepRPC", "stepFWDr", and "stepRPCr".

target

Name of target variable within db argument.

db

Modeling data with risk factors and target variable.

coding

Type of risk factor coding within the model. Available options are: "WoE" and "dummy". If "WoE" is selected, then modalities of the risk factors are replaced by WoE values, while for "dummy" option dummies (0/1) will be created for n-1 modalities where n is total number of modalities of analyzed risk factor.

blocks

Data frame with defined risk factor groups. It has to contain the following columns: rf and block.

p.value

Significance level of p-value for the estimated coefficient. For WoE coding this value is is directly compared to p-value of the estimated coefficient, while for dummy coding multiple Wald test is employed and its p-value is used for comparison with selected threshold (p.value). This argument is applicable only for "stepFWD" and "stepRPC" selected methods.

miv.threshold

MIV (marginal information value) entrance threshold applicable only for code"stepMIV" method. Only the risk factors with MIV higher than the threshold are candidate for the new model. Additional criteria is that MIV value should significantly separate good from bad cases measured by marginal chi-square test.

m.ch.p.val

Significance level of p-value for marginal chi-square test applicable only for code"stepMIV" method. This test additionally supports MIV value of candidate risk factor for final decision.

Value

The command embedded.blocks returns a list of three objects.
The first object (model) is the list of the models of each block (an object of class inheriting from "glm").
The second object (steps), is the data frame with risk factors selected from the each block.
The third object (dev.db), returns the list of block's model development databases.

References

Anderson, R.A. (2021). Credit Intelligence & Modelling, Many Paths through the Forest of Credit Rating and Scoring, OUP Oxford

See Also

staged.blocks, ensemble.blocks, stepMIV, stepFWD, stepRPC, stepFWDr and stepRPCr.

Examples

suppressMessages(library(PDtoolkit))
data(loans)
#create risk factor priority groups
rf.all <- names(loans)[-1]
set.seed(22)
blocks <- data.frame(rf = rf.all, block = sample(1:3, length(rf.all), rep = TRUE))
blocks <- blocks[order(blocks$block), ]
blocks
#method: stepFWDr
res <- embedded.blocks(method = "stepFWDr", 
		     target = "Creditability",
		     db = loans, 
		     blocks = blocks, 
		     p.value = 0.05)
names(res)
nb <- length(res[["models"]])
res$models[[nb]]

auc.model(predictions = predict(res$models[[nb]], type = "response", 
				    newdata = res$dev.db[[nb]]),
      observed = res$dev.db[[nb]]$Creditability)

[Package PDtoolkit version 1.2.0 Index]