snnrce {SSLR} | R Documentation |
General Interface for SNNRCE model
Description
SNNRCE (Self-training Nearest Neighbor Rule using Cut Edges) is a variant
of the self-training classification method (selfTraining
) with a different
addition mechanism and a fixed learning scheme (1-NN). SNNRCE uses an amending scheme
to avoid the introduction of noisy examples into the enlarged labeled set.
The mislabeled examples are identified using the local information provided
by the neighborhood graph. A statistical test using cut edge weight is used to modify
the labels of the missclassified examples.
Usage
snnrce(x.inst = TRUE, dist = "Euclidean", alpha = 0.1)
Arguments
x.inst |
A boolean value that indicates if |
dist |
A distance function available in the |
alpha |
Rejection threshold to test the critical region. Default is 0.1. |
Details
SNNRCE initiates the self-labeling process by training a 1-NN from the original
labeled set. This method attempts to reduce the noise in examples by labeling those instances
with no cut edges in the initial stages of self-labeling learning.
These highly confident examples are added into the training set.
The remaining examples follow the standard self-training process until a minimum number
of examples will be labeled for each class. A statistical test using cut edge weight is used
to modify the labels of the missclassified examples The value of the alpha
argument
defines the critical region where the candidates examples are tested. The higher this value
is, the more relaxed it is the selection of the examples that are considered mislabeled.
Value
(When model fit) A list object of class "snnrce" containing:
- model
The final base classifier trained using the enlarged labeled set.
- instances.index
The indexes of the training instances used to train the
model
. These indexes include the initial labeled instances and the newly labeled instances. Those indexes are relative tox
argument.- classes
The levels of
y
factor.- x.inst
The value provided in the
x.inst
argument.- dist
The value provided in the
dist
argument when x.inst isTRUE
.- xtrain
A matrix with the subset of training instances referenced by the indexes
instances.index
when x.inst isTRUE
.
References
Yu Wang, Xiaoyan Xu, Haifeng Zhao, and Zhongsheng Hua.
Semisupervised learning based on nearest neighbor rule and cut edges.
Knowledge-Based Systems, 23(6):547-554, 2010. ISSN 0950-7051. doi: http://dx.doi.org/10.1016/j.knosys.2010.03.012.
Examples
library(tidyverse)
library(tidymodels)
library(caret)
library(SSLR)
data(wine)
set.seed(1)
train.index <- createDataPartition(wine$Wine, p = .7, list = FALSE)
train <- wine[ train.index,]
test <- wine[-train.index,]
cls <- which(colnames(wine) == "Wine")
#% LABELED
labeled.index <- createDataPartition(wine$Wine, p = .2, list = FALSE)
train[-labeled.index,cls] <- NA
m <- snnrce(x.inst = TRUE,
dist = "Euclidean",
alpha = 0.1) %>% fit(Wine ~ ., data = train)
predict(m,test) %>%
bind_cols(test) %>%
metrics(truth = "Wine", estimate = .pred_class)