| MINDID {IDmining} | R Documentation |
The (Multipoint) Morisita Index for Intrinsic Dimension Estimation
Description
Estimates the intrinsic dimension of data using the Morisita estimator of intrinsic dimension.
Usage
MINDID(X, scaleQ=1:5, mMin=2, mMax=2)
Arguments
X |
A |
scaleQ |
A vector (at least two values). It contains the values of |
mMin |
The minimum value of |
mMax |
The maximum value of |
Details
-
\ellis the edge length of the grid cells (or quadrats). Since the variables (and consenquently the grid) are rescaled to the[0,1]interval,\ellis equal to1for a grid consisting of only one cell. -
\ell^{-1}is the number of grid cells (or quadrats) along each axis of the Euclidean space in which the data points are embedded. -
\ell^{-1}is equal toQ^{(1/E)}whereQis the number of grid cells andEis the number of variables (or features). -
\ell^{-1}is directly related to\delta(see References). -
\deltais the diagonal length of the grid cells.
Value
A list of two elements:
a
data.framecontaining the\lnvalue of the m-Morisita index for each value of\ln (\delta)andm. The values of\ln (\delta)are provided with regard to the[0,1]interval.a
data.framecontaining the values ofS_mandM_mfor each value ofm.
Author(s)
Jean Golay jeangolay@gmail.com
References
J. Golay and M. Kanevski (2015). A new estimator of intrinsic dimension based on the multipoint Morisita index, Pattern Recognition 48 (12):4070–4081.
J. Golay, M. Leuenberger and M. Kanevski (2017). Feature selection for regression problems based on the Morisita estimator of intrinsic dimension, Pattern Recognition 70:126–138.
J. Golay and M. Kanevski (2017). Unsupervised feature selection based on the Morisita estimator of intrinsic dimension, Knowledge-Based Systems 135:125-134.
J. Golay, M. Leuenberger and M. Kanevski (2015). Morisita-based feature selection for regression problems. Proceedings of the 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges (Belgium).
Examples
sim_dat <- SwissRoll(1000)
scaleQ <- 1:15 # It starts with a grid of 1^E cell (or quadrat).
# It ends with a grid of 15^E cells (or quadrats).
mMI_ID <- MINDID(sim_dat, scaleQ[5:15])
print(paste("The ID estimate is equal to",round(mMI_ID[[1]][1,3],2)))