SMM {mlquantify} | R Documentation |
Sample Mean Matching
Description
SMM is a member of the DyS framework that uses simple means scores to represent the score distribution for positive, negative, and unlabelled scores. Therefore, the class distribution is given by a closed-form equation.
Usage
SMM(p.score, n.score, test)
Arguments
p.score |
a numeric |
n.score |
a numeric |
test |
a numeric |
Value
A numeric vector containing the class distribution estimated from the test set.
References
Hassan, W., Maletzke, A., Batista, G. (2020). Accurately Quantifying a Billion Instances per Second. In IEEE International Conference on Data Science and Advanced Analytics (DSAA).
Examples
library(randomForest)
library(caret)
cv <- createFolds(aeAegypti$class, 3)
tr <- aeAegypti[cv$Fold1,]
validation <- aeAegypti[cv$Fold2,]
ts <- aeAegypti[cv$Fold3,]
# -- Getting a sample from ts with 80 positive and 20 negative instances --
ts_sample <- rbind(ts[sample(which(ts$class==1),80),],
ts[sample(which(ts$class==2),20),])
scorer <- randomForest(class~., data=tr, ntree=500)
scores <- cbind(predict(scorer, validation, type = c("prob")), validation$class)
test.scores <- predict(scorer, ts_sample, type = c("prob"))
SMM(p.score = scores[scores[,3]==1,1], n.score = scores[scores[,3]==2,1],
test = test.scores[,1])