R: Marginally predicted labels of the test data given training...

tMarLab {PEkit}

R Documentation

Marginally predicted labels of the test data given training data classification.

Description

Classifies the test data x based on the training data object. The test data is considered i.i.d., so each data point is classified one by one.

Usage

tMarLab(training, x)

Arguments

`training`	A training data object from the function `classifier.fit()`.
`x`	Test data vector or matrix with rows as data points and columns as features.

Details

Independently assigns a class label for each test data point according to a maximum \, a \, posteriori rule. The predictive probability of data point x_i arising from class c assuming the training data of size m_c in the class arises from a Poisson-Dirichlet(\hat{\psi}_c) distribution is:

\hat{\psi}_c / (m_c + \hat{\psi}_c),

if no value equal to x_i exists in the training data of class c, and

m_{ci} / (m_c + \hat{\psi}_c),

if there does, where m_{ci} is the frequency of the value of x_i in the training data.

Value

A vector of predicted labels for test data x.

References

Amiryousefi A. Asymptotic supervised predictive classifiers under partition exchangeability. . 2021. https://arxiv.org/abs/2101.10950.

Corander, J., Cui, Y., Koski, T., and Siren, J.: Have I seen you before? Principles of Bayesian predictive classification revisited. Springer, Stat. Comput. 23, (2011), 59–73, (<doi: 10.1007/s11222-011-9291-7>).

Examples

## Create random samples x from Poisson-Dirichlet distributions with different
## psis, treating each sample as coming from a class of its own:
set.seed(111)
x1<-rPD(10500,10)
x2<-rPD(10500,1000)
test.ind1<-sample.int(10500,500) # Sample test datasets from the
test.ind2<-sample.int(10500,500) # original samples
x<-c(x1[-test.ind1],x2[-test.ind2])
## create training data labels:
y1<-rep("1", 10000)
y2<-rep("2", 10000)
y<-c(y1,y2)

## Test data t, with first half belonging to class "1", second have in "2":
t1<-x1[test.ind1]
t2<-x2[test.ind2]
t<-c(t1,t2)

fit<-classifier.fit(x,y)

## Run the classifier, which returns
tM<-tMarLab(fit, t)

##With multidimensional x:
set.seed(111)
x1<-cbind(rPD(5500,10),rPD(5500,50))
x2<-cbind(rPD(5500,100),rPD(5500,500))
test.ind1<-sample.int(5500,500)
test.ind2<-sample.int(5500,500)
x<-rbind(x1[-test.ind1,],x2[-test.ind2,])
y1<-rep("1", 5000)
y2<-rep("2", 5000)
y<-c(y1,y2)
fit<-classifier.fit(x,y)
t1<-x1[test.ind1,]
t2<-x2[test.ind2,]
t<-rbind(t1,t2)

tM<-tMarLab(fit, t)

[Package PEkit version 1.0.0.1000 Index]