stepRPCr {PDtoolkit} | R Documentation |
Stepwise regression based on risk profile concept and raw risk factors
Description
stepRPCr
customized stepwise regression with p-value and trend check on raw risk factors which additionally takes into account
the order of supplied risk factors per group when selects a candidate for the final regression model. Trend check is performed
comparing observed trend between target and analyzed risk factor and trend of the estimated coefficients.
Note that procedure checks the column names of supplied db
data frame therefore some
renaming (replacement of special characters) is possible to happen. For details, please, check the help example.
Usage
stepRPCr(
start.model,
risk.profile,
p.value = 0.05,
db,
check.start.model = TRUE,
offset.vals = NULL
)
Arguments
start.model |
Formula class that represents the starting model. It can include some risk factors, but it can be
defined only with intercept ( |
risk.profile |
Data frame with defined risk profile. It has to contain the following columns: |
p.value |
Significance level of p-value of the estimated coefficients. For numerical risk factors this value is
is directly compared to the p-value of the estimated coefficients, while for categorical risk factors
multiple Wald test is employed and its value is used for comparison with selected threshold ( |
db |
Modeling data with risk factors and target variable. All risk factors (apart from the risk factors from the starting model) should be categorized and as of character type. |
check.start.model |
Logical ( |
offset.vals |
This can be used to specify an a priori known component to be included in the linear predictor during fitting.
This should be |
Value
The command stepRPCr
returns a list of four objects.
The first object (model
), is the final model, an object of class inheriting from "glm"
.
The second object (steps
), is the data frame with risk factors selected at each iteration.
The third object (warnings
), is the data frame with warnings if any observed.
The warnings refer to the following checks: if risk factor has more than 10 modalities or
if any of the bins (groups) has less than 5% of observations.
The final, fourth, object dev.db
returns the model development database.
Examples
suppressMessages(library(PDtoolkit))
data(loans)
#create risk factor priority groups
rf.all <- names(loans)[-1]
set.seed(6422)
rf.pg <- data.frame(rf = rf.all, group = sample(1:3, length(rf.all), rep = TRUE))
rf.pg <- rf.pg[order(rf.pg$group), ]
head(rf.pg)
res <- stepRPCr(start.model = Creditability ~ 1,
risk.profile = rf.pg,
p.value = 0.05,
db = loans)
summary(res$model)$coefficients
res$steps
head(res$dev.db)