rf.clustering {PDtoolkit} | R Documentation |
Risk factor clustering
Description
rf.clustering
implements correlation based clustering of risk factors.
Clustering procedure is base on hclust from stats
package.
Usage
rf.clustering(db, metric, k = NA)
Arguments
db |
Data frame of risk factors supplied for clustering analysis. |
metric |
Correlation metric used for distance calculation. Available options are:
|
k |
Number of clusters. If default value ( |
Value
The function rf.clustering
returns a data frame with: risk factors, clusters assigned and
distance to centroid (ordered from smallest to largest).
The last column (distance to centroid) can be used for selection of one or more risk factors per
cluster.
Examples
suppressMessages(library(PDtoolkit))
library(rpart)
data(loans)
#clustering using common spearman metric
#first we need to categorize numeric risk factors
num.rf <- sapply(loans, is.numeric)
num.rf <- names(num.rf)[!names(num.rf)%in%"Creditability" & num.rf]
loans[, num.rf] <- sapply(num.rf, function(x)
sts.bin(x = loans[, x], y = loans[, "Creditability"])[[2]])
#replace woe in order to convert to all numeric factors
loans.woe <- replace.woe(db = loans, target = "Creditability")[[1]]
cr <- rf.clustering(db = loans.woe[, -which(names(loans.woe)%in%"Creditability")],
metric = "common spearman",
k = NA)
cr
#select one risk factor per cluster with min distance to centorid
cr %>% group_by(clusters) %>%
slice(which.min(dist.to.centroid))