SMM {mlquantify}R Documentation

Sample Mean Matching

Description

SMM is a member of the DyS framework that uses simple means scores to represent the score distribution for positive, negative, and unlabelled scores. Therefore, the class distribution is given by a closed-form equation.

Usage

SMM(p.score, n.score, test)

Arguments

p.score

a numeric vector of positive scores estimated either from a validation set or from a cross-validation method.

n.score

a numeric vector of negative scores estimated either from a validation set or from a cross-validation method.

test

a numeric vector containing the score estimated for the positive class from each test set instance.

Value

A numeric vector containing the class distribution estimated from the test set.

References

Hassan, W., Maletzke, A., Batista, G. (2020). Accurately Quantifying a Billion Instances per Second. In IEEE International Conference on Data Science and Advanced Analytics (DSAA).

Examples

library(randomForest)
library(caret)
cv <- createFolds(aeAegypti$class, 3)
tr <- aeAegypti[cv$Fold1,]
validation <- aeAegypti[cv$Fold2,]
ts <- aeAegypti[cv$Fold3,]

# -- Getting a sample from ts with 80 positive and 20 negative instances --
ts_sample <- rbind(ts[sample(which(ts$class==1),80),],
                   ts[sample(which(ts$class==2),20),])
scorer <- randomForest(class~., data=tr, ntree=500)
scores <- cbind(predict(scorer, validation, type = c("prob")), validation$class)
test.scores <- predict(scorer, ts_sample, type = c("prob"))
SMM(p.score = scores[scores[,3]==1,1], n.score = scores[scores[,3]==2,1],
test = test.scores[,1])

[Package mlquantify version 0.2.0 Index]