rwcccd_classifier {rcccd} | R Documentation |
Random Walk Class Cover Catch Digraph Classifier
Description
rwcccd_classifier
and rwcccd_classifier_2
fits a
Random Walk Class Cover Catch Digraph (RWCCCD) classification model.
rwcccd_classifier
uses C++ for speed and rwcccd_classifier_2
uses R language to determine balls.
Usage
rwcccd_classifier(x, y, method = "default", m = 1, proportion = 0.99)
rwcccd_classifier_2(
x,
y,
method = "default",
m = 1,
proportion = 0.99,
partial_ordering = FALSE
)
Arguments
x |
feature matrix or dataframe. |
y |
class factor variable. |
method |
"default" or "balanced". |
m |
penalization parameter. Takes value in |
proportion |
proportion of covered samples. A real number between |
partial_ordering |
|
Details
Random Walk Class Cover Catch Digraphs (RWCCD) are determined by calculating
score for each class as target class as
Here, is radius and determined by maximum
calculated for each target sample.
is
and is
removes penalty.
for default and
for balanced method.
is the number of uncovered samples in the current iteration and
is
.
This method is more robust to noise compared to PCCCD However, balls covers
classes improperly and can be selected.
For detail, please refer to Priebe et al. (2001), Priebe et al. (2003), and Manukyan and Ceyhan (2016).
Value
a rwcccd_classifier object
i_dominant_list |
dominant sample indexes. |
x_dominant_list |
dominant samples from feature matrix, x |
radii_dominant_list |
Radiuses of the circle for dominant samples |
class_names |
class names |
k_class |
number of classes |
proportions |
proportions each class covered |
Author(s)
Fatih Saglam, saglamf89@gmail.com
References
Priebe, C. E., DeVinney, J., & Marchette, D. J. (2001). On the distribution of the domination number for random class cover catch digraphs. Statistics & Probability Letters, 55(3), 239–246. https://doi.org/10.1016/s0167-7152(01)00129-8
Priebe, C. E., Marchette, D. J., DeVinney, J., & Socolinsky, D. A. (2003). Classification Using Class Cover Catch Digraphs. Journal of Classification, 20(1), 3–23. https://doi.org/10.1007/s00357-003-0003-7
Manukyan, A., & Ceyhan, E. (2016). Classification of imbalanced data with a geometric digraph family. Journal of Machine Learning Research, 17(1), 6504–6543. https://jmlr.org/papers/volume17/15-604/15-604.pdf
Examples
n <- 500
x1 <- runif(n, 1, 10)
x2 <- runif(n, 1, 10)
x <- cbind(x1, x2)
y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B"))
# dataset
m_rwcccd_1 <- rwcccd_classifier(x = x, y = y, method = "default", m = 1)
plot(x, col = y, asp = 1, main = "default")
# dominant samples of second class
x_center <- m_rwcccd_1$x_dominant_list[[2]]
# radii of balls for second class
radii <- m_rwcccd_1$radii_dominant_list[[2]]
# balls
for (i in 1:nrow(x_center)) {
xx <- x_center[i, 1]
yy <- x_center[i, 2]
r <- radii[i]
theta <- seq(0, 2*pi, length.out = 100)
xx <- xx + r*cos(theta)
yy <- yy + r*sin(theta)
lines(xx, yy, type = "l", col = "green")
}
# dataset
m_rwcccd_2 <- rwcccd_classifier_2(x = x, y = y, method = "default", m = 1, partial_ordering = TRUE)
plot(x, col = y, asp = 1, main = "default, prartial_ordering = TRUE")
# dominant samples of second class
x_center <- m_rwcccd_2$x_dominant_list[[2]]
# radii of balls for second class
radii <- m_rwcccd_2$radii_dominant_list[[2]]
# balls
for (i in 1:nrow(x_center)) {
xx <- x_center[i, 1]
yy <- x_center[i, 2]
r <- radii[i]
theta <- seq(0, 2*pi, length.out = 100)
xx <- xx + r*cos(theta)
yy <- yy + r*sin(theta)
lines(xx, yy, type = "l", col = "green")
}
# dataset
m_rwcccd_3 <- rwcccd_classifier(x = x, y = y, method = "balanced", m = 1, proportion = 0.5)
plot(x, col = y, asp = 1, main = "balanced, proportion = 0.5")
# dominant samples of second class
x_center <- m_rwcccd_3$x_dominant_list[[2]]
# radii of balls for second class
radii <- m_rwcccd_3$radii_dominant_list[[2]]
# balls
for (i in 1:nrow(x_center)) {
xx <- x_center[i, 1]
yy <- x_center[i, 2]
r <- radii[i]
theta <- seq(0, 2*pi, length.out = 100)
xx <- xx + r*cos(theta)
yy <- yy + r*sin(theta)
lines(xx, yy, type = "l", col = "green")
}
# testing the performance
i_train <- sample(1:n, round(n*0.8))
x_train <- x[i_train,]
y_train <- y[i_train]
x_test <- x[-i_train,]
y_test <- y[-i_train]
m_rwcccd <- rwcccd_classifier(x = x_train, y = y_train, method = "balanced")
pred <- predict(object = m_rwcccd, newdata = x_test)
# confusion matrix
table(y_test, pred)
# accuracy
sum(y_test == pred)/nrow(x_test)