tMarLab {PEkit} | R Documentation |
Marginally predicted labels of the test data given training data classification.
Description
Classifies the test data x
based on the training data object.
The test data is considered i.i.d., so each
data point is classified one by one.
Usage
tMarLab(training, x)
Arguments
training |
A training data object from the function |
x |
Test data vector or matrix with rows as data points and columns as features. |
Details
Independently assigns a class label for each test data point according to a
maximum \, a \, posteriori
rule. The predictive probability of data point
x_i
arising from class c
assuming the training data of size m_c
in the class
arises from a Poisson-Dirichlet(\hat{\psi}_c
) distribution is:
\hat{\psi}_c / (m_c + \hat{\psi}_c),
if no value equal to x_i
exists in the training data of class c
, and
m_{ci} / (m_c + \hat{\psi}_c),
if there does, where m_{ci}
is the frequency of the value of x_i
in the training data.
Value
A vector of predicted labels for test data x.
References
Amiryousefi A. Asymptotic supervised predictive classifiers under partition exchangeability. . 2021. https://arxiv.org/abs/2101.10950.
Corander, J., Cui, Y., Koski, T., and Siren, J.: Have I seen you before? Principles of Bayesian predictive classification revisited. Springer, Stat. Comput. 23, (2011), 59–73, (<doi: 10.1007/s11222-011-9291-7>).
Examples
## Create random samples x from Poisson-Dirichlet distributions with different
## psis, treating each sample as coming from a class of its own:
set.seed(111)
x1<-rPD(10500,10)
x2<-rPD(10500,1000)
test.ind1<-sample.int(10500,500) # Sample test datasets from the
test.ind2<-sample.int(10500,500) # original samples
x<-c(x1[-test.ind1],x2[-test.ind2])
## create training data labels:
y1<-rep("1", 10000)
y2<-rep("2", 10000)
y<-c(y1,y2)
## Test data t, with first half belonging to class "1", second have in "2":
t1<-x1[test.ind1]
t2<-x2[test.ind2]
t<-c(t1,t2)
fit<-classifier.fit(x,y)
## Run the classifier, which returns
tM<-tMarLab(fit, t)
##With multidimensional x:
set.seed(111)
x1<-cbind(rPD(5500,10),rPD(5500,50))
x2<-cbind(rPD(5500,100),rPD(5500,500))
test.ind1<-sample.int(5500,500)
test.ind2<-sample.int(5500,500)
x<-rbind(x1[-test.ind1,],x2[-test.ind2,])
y1<-rep("1", 5000)
y2<-rep("2", 5000)
y<-c(y1,y2)
fit<-classifier.fit(x,y)
t1<-x1[test.ind1,]
t2<-x2[test.ind2,]
t<-rbind(t1,t2)
tM<-tMarLab(fit, t)