KODAMA.matrix {KODAMA} | R Documentation |
Knowledge Discovery by Accuracy Maximization
Description
KODAMA (KnOwledge Discovery by Accuracy MAximization) is an unsupervised and semi-supervised learning algorithm that performs feature extraction from noisy and high-dimensional data.
Usage
KODAMA.matrix (data,
M = 100,
Tcycle = 20,
FUN_VAR = function(x) { ceiling(ncol(x)) },
FUN_SAM = function(x) { ceiling(nrow(x) * 0.75)},
bagging = FALSE,
FUN = c("PLS-DA","KNN"),
f.par = 5,
W = NULL,
constrain = NULL,
fix=NULL,
epsilon = 0.05,
dims=2,
landmarks=1000,
neighbors=min(c(landmarks,nrow(data)))-1)
Arguments
data |
a matrix. |
M |
number of iterative processes (step I-III). |
Tcycle |
number of iterative cycles that leads to the maximization of cross-validated accuracy. |
FUN_VAR |
function to select the number of variables to select randomly. By default all variable are taken. |
FUN_SAM |
function to select the number of samples to select randomly. By default the 75 per cent of all samples are taken. |
bagging |
Should sampling be with replacement, |
FUN |
classifier to be considered. Choices are " |
f.par |
parameters of the classifier. |
W |
a vector of |
constrain |
a vector of |
fix |
a vector of |
epsilon |
cut-off value for low proximity. High proximity are typical of intracluster relationships, whereas low proximities are expected for intercluster relationships. Very low proximities between samples are ignored by (default) setting |
dims |
dimensions of the configurations of t-SNE based on the KODAMA dissimilarity matrix. |
landmarks |
number of landmarks to use. |
neighbors |
number of neighbors to include in the dissimilarity matrix yo pass to the |
Details
KODAMA consists of five steps. These can be in turn divided into two parts: (i) the maximization of cross-validated accuracy by an iterative process (step I and II), resulting in the construction of a proximity matrix (step III), and (ii) the definition of a dissimilarity matrix (step IV and V). The first part entails the core idea of KODAMA, that is, the partitioning of data guided by the maximization of the cross-validated accuracy. At the beginning of this part, a fraction of the total samples (defined by FUN_SAM
) are randomly selected from the original data. The whole iterative process (step I-III) is repeated M
times to average the effects owing to the randomness of the iterative procedure. Each time that this part is repeated, a different fraction of samples is selected. The second part aims at collecting and processing these results by constructing a dissimilarity matrix to provide a holistic view of the data while maintaining their intrinsic structure (steps IV and V). Then, KODAMA.visualization
function is used to visualise the results of KODAMA dissimilarity matrix.
Value
The function returns a list with 4 items:
dissimilarity |
a dissimilarity matrix. |
acc |
a vector with the |
proximity |
a proximity matrix. |
v |
a matrix containing the all classification obtained maximizing the cross-validation accuracy. |
res |
a matrix containing all classification vectors obtained through maximizing the cross-validation accuracy. |
f.par |
parameters of the classifier.. |
entropy |
Shannon's entropy of the KODAMA proximity matrix. |
landpoints |
indexes of the landmarks used. |
data |
original data. |
knn_Armadillo |
dissimilarity matrix used as input for the |
Author(s)
Stefano Cacciatore and Leonardo Tenori
References
Cacciatore S, Luchinat C, Tenori L
Knowledge discovery by accuracy maximization.
Proc Natl Acad Sci U S A 2014;111(14):5117-22. doi: 10.1073/pnas.1220873111. Link
Cacciatore S, Tenori L, Luchinat C, Bennett PR, MacIntyre DA
KODAMA: an updated R package for knowledge discovery and data mining.
Bioinformatics 2017;33(4):621-623. doi: 10.1093/bioinformatics/btw705. Link
L.J.P. van der Maaten and G.E. Hinton.
Visualizing High-Dimensional Data Using t-SNE.
Journal of Machine Learning Research 9 (Nov) : 2579-2605, 2008.
L.J.P. van der Maaten.
Learning a Parametric Embedding by Preserving Local Structure.
In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR W&CP 5:384-391, 2009.
McInnes L, Healy J, Melville J.
Umap: Uniform manifold approximation and projection for dimension reduction.
arXiv preprint:1802.03426. 2018 Feb 9.
See Also
Examples
data(iris)
data=iris[,-5]
labels=iris[,5]
kk=KODAMA.matrix(data,FUN="KNN",f.par=2)
cc=KODAMA.visualization(kk,"t-SNE")
plot(cc,col=as.numeric(labels),cex=2)