ensemble.blocks {LGDtoolkit}R Documentation

Ensemble blocks regression

Description

ensemble.blocks performs blockwise regression where the predictions of each blocks' model are integrated into a final model. The final model is estimated in the form of OLS or fractional logistic regression regression without any check of the estimated coefficients (e.g. statistical significance or sign of the estimated coefficients).

Usage

ensemble.blocks(method, target, db, blocks, reg.type = "ols", p.value = 0.05)

Arguments

method

Regression method applied on each block. Available methods: "stepFWD" or "stepRPC".

target

Name of target variable within db argument.

db

Modeling data with risk factors and target variable.

blocks

Data frame with defined risk factor groups. It has to contain the following columns: rf and block.

reg.type

Regression type. Available options are: "ols" for OLS regression and "frac.logit" for fractional logistic regression. Default is "ols". For "frac.logit" option, target has to have all values between 0 and 1.

p.value

Significance level of p-value for the estimated coefficient. For numerical risk factors this value is is directly compared to p-value of the estimated coefficient, while for categorical multiple Wald test is employed and its p-value is used for comparison with selected threshold (p.value).

Value

The command embeded.blocks returns a list of three objects.
The first object (model) is the list of the models of each block (an object of class inheriting from "lm").
The second object (steps), is the data frame with risk factors selected from the each block.
The third object (dev.db), returns the list of block's model development databases.

See Also

staged.blocks, embedded.blocks, stepFWD and stepRPC.

Examples

library(monobin)
library(LGDtoolkit)
data(lgd.ds.c)
#stepwise with discretized risk factors
#same procedure can be run on continuous risk factors and mixed risk factor types
num.rf <- sapply(lgd.ds.c, is.numeric)
num.rf <- names(num.rf)[!names(num.rf)%in%"lgd" & num.rf]
num.rf
for	(i in 1:length(num.rf)) {
num.rf.l <- num.rf[i]
lgd.ds.c[, num.rf.l] <- sts.bin(x = lgd.ds.c[, num.rf.l], y = lgd.ds.c[, "lgd"])[[2]]	
}
str(lgd.ds.c)
set.seed(2211)
blocks <- data.frame(rf = names(lgd.ds.c)[!names(lgd.ds.c)%in%"lgd"], 
		   block = sample(1:3, ncol(lgd.ds.c) - 1, rep = TRUE))
blocks <- blocks[order(blocks$block, blocks$rf), ]
res <- LGDtoolkit::ensemble.blocks(method = "stepFWD", 
		     target = "lgd",
		     db = lgd.ds.c, 
		     blocks = blocks,
		     reg.type = "ols", 
		     p.value = 0.05)
names(res)
res$models
summary(res$models[[4]])

[Package LGDtoolkit version 0.2.0 Index]