tp {intrinsicDimension} | R Documentation |
Dimension Estimation via Translated Poisson Distributions
Description
Estimates the intrinsic dimension of a data set using models of translated Poisson distributions.
Usage
maxLikGlobalDimEst(data, k, dnoise = NULL, sigma = 0, n = NULL,
integral.approximation = 'Haro', unbiased = FALSE,
neighborhood.based = TRUE,
neighborhood.aggregation = 'maximum.likelihood', iterations = 5, K = 5)
maxLikPointwiseDimEst(data, k, dnoise = NULL, sigma = 0, n = NULL, indices = NULL,
integral.approximation = 'Haro', unbiased = FALSE, iterations = 5)
maxLikLocalDimEst(data, dnoise = NULL, sigma = 0, n = NULL,
integral.approximation = 'Haro',
unbiased = FALSE, iterations = 5)
Arguments
data |
data set with each row describing a data point. |
k |
the number of distances that should be used for each dimension estimation. |
dnoise |
a function or a name of a function giving the translation density. If NULL, no noise is modeled, and the estimator turns into the Hill estimator (see References). Translation densities |
sigma |
(estimated) standard deviation of the (isotropic) noise. |
n |
dimension of the noise. |
indices |
the indices of the data points for which local dimension estimation should be made. |
integral.approximation |
how to approximate the integrals in eq. (5) in Haro et al. (2008). Possible values: |
unbiased |
if |
neighborhood.based |
if TRUE, dimension estimation is first made for neighborhoods around each data point and final value is aggregated from this. Otherwise dimension estimation is made once, based on distances in entire data set. |
neighborhood.aggregation |
if |
iterations |
for |
K |
for |
Details
The estimators are based on the referenced paper by Haro et al. (2008), using the assumption that there is a single manifold. The estimator in the paper is obtained using default parameters and dnoise = dnoiseGaussH
.
With integral.approximation = 'Haro'
the Taylor expansion approximation of r^(m-1)
that Haro et al. (2008) used are employed. With integral.approximation = 'guaranteed.convergence'
, r
is factored out and kept and r^(m-2)
is approximated with the corresponding Taylor expansion. This guarantees convergence of the integrals. Divergence might be an issue when the noise is not sufficiently small in comparison to the smallest distances. With integral.approximation = 'iteration'
, five iterations is used to determine m
.
maxLikLocalDimEst
assumes that the data set is local i.e. a piece of a data set cut out by a sphere with a radius such that the data set is well approximated by a hyperplane (meaning that the curvature should be low in the local data set). See localIntrinsicDimension
.
Value
For maxLikGlobalDimEst
and maxLikLocalDimEst
, a DimEst
object with one slot:
dim.est |
the dimension estimate |
For maxLikPointwiseDimEst
, a DimEstPointwise
object, inheriting data.frame
, with one slot:
dim.est |
the dimension estimate for each data point. Row |
Author(s)
Kerstin Johnsson, Lund University.
References
Haro, G., Randall, G. and Sapiro, G. (2008) Translated Poisson Mixture Model for Stratification Learning. Int. J. Comput. Vis., 80, 358-374.
Hill, B. M. (1975) A simple general approach to inference about the tail of a distribution. Ann. Stat., 3(5) 1163-1174.
Levina, E. and Bickel., P. J. (2005) Maximum likelihood estimation of intrinsic dimension. Advances in Neural Information Processing Systems 17, 777-784. MIT Press.
Examples
data <- hyperBall(100, d = 7, n = 13, sd = 0.01)
maxLikGlobalDimEst(data, 10, dnoiseNcChi, 0.01, 13)
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13)
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13)
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13, neighborhood.aggregation = 'robust')
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13,
integral.approximation = 'guaranteed.convergence',
neighborhood.aggregation = 'robust')
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13,
integral.approximation = 'iteration', unbiased = TRUE)
data <- hyperBall(1000, d = 7, n = 13, sd = 0.01)
maxLikGlobalDimEst(data, 500, dnoiseGaussH, 0.01, 13,
neighborhood.based = FALSE)
maxLikGlobalDimEst(data, 500, dnoiseGaussH, 0.01, 13,
integral.approximation = 'guaranteed.convergence',
neighborhood.based = FALSE)
maxLikGlobalDimEst(data, 500, dnoiseGaussH, 0.01, 13,
integral.approximation = 'iteration',
neighborhood.based = FALSE)
data <- hyperBall(100, d = 7, n = 13, sd = 0.01)
maxLikPointwiseDimEst(data, 10, dnoiseNcChi, 0.01, 13, indices=1:10)
data <- cutHyperPlane(50, d = 7, n = 13, sd = 0.01)
maxLikLocalDimEst(data, dnoiseNcChi, 0.1, 3)
maxLikLocalDimEst(data, dnoiseGaussH, 0.1, 3)
maxLikLocalDimEst(data, dnoiseNcChi, 0.1, 3,
integral.approximation = 'guaranteed.convergence')