MINDID {IDmining} | R Documentation |
The (Multipoint) Morisita Index for Intrinsic Dimension Estimation
Description
Estimates the intrinsic dimension of data using the Morisita estimator of intrinsic dimension.
Usage
MINDID(X, scaleQ=1:5, mMin=2, mMax=2)
Arguments
X |
A |
scaleQ |
A vector (at least two values). It contains the values of |
mMin |
The minimum value of |
mMax |
The maximum value of |
Details
-
\ell
is the edge length of the grid cells (or quadrats). Since the variables (and consenquently the grid) are rescaled to the[0,1]
interval,\ell
is equal to1
for a grid consisting of only one cell. -
\ell^{-1}
is the number of grid cells (or quadrats) along each axis of the Euclidean space in which the data points are embedded. -
\ell^{-1}
is equal toQ^{(1/E)}
whereQ
is the number of grid cells andE
is the number of variables (or features). -
\ell^{-1}
is directly related to\delta
(see References). -
\delta
is the diagonal length of the grid cells.
Value
A list of two elements:
a
data.frame
containing the\ln
value of the m-Morisita index for each value of\ln (\delta)
andm
. The values of\ln (\delta)
are provided with regard to the[0,1]
interval.a
data.frame
containing the values ofS_m
andM_m
for each value ofm
.
Author(s)
Jean Golay jeangolay@gmail.com
References
J. Golay and M. Kanevski (2015). A new estimator of intrinsic dimension based on the multipoint Morisita index, Pattern Recognition 48 (12):4070–4081.
J. Golay, M. Leuenberger and M. Kanevski (2017). Feature selection for regression problems based on the Morisita estimator of intrinsic dimension, Pattern Recognition 70:126–138.
J. Golay and M. Kanevski (2017). Unsupervised feature selection based on the Morisita estimator of intrinsic dimension, Knowledge-Based Systems 135:125-134.
J. Golay, M. Leuenberger and M. Kanevski (2015). Morisita-based feature selection for regression problems. Proceedings of the 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges (Belgium).
Examples
sim_dat <- SwissRoll(1000)
scaleQ <- 1:15 # It starts with a grid of 1^E cell (or quadrat).
# It ends with a grid of 15^E cells (or quadrats).
mMI_ID <- MINDID(sim_dat, scaleQ[5:15])
print(paste("The ID estimate is equal to",round(mMI_ID[[1]][1,3],2)))