R: Dimension Estimation via Translated Poisson Distributions

tp {intrinsicDimension}

R Documentation

Dimension Estimation via Translated Poisson Distributions

Description

Estimates the intrinsic dimension of a data set using models of translated Poisson distributions.

Usage

maxLikGlobalDimEst(data, k, dnoise = NULL, sigma = 0, n = NULL,
        integral.approximation = 'Haro', unbiased = FALSE,
        neighborhood.based = TRUE,
        neighborhood.aggregation = 'maximum.likelihood', iterations = 5, K = 5)
maxLikPointwiseDimEst(data, k, dnoise = NULL, sigma = 0, n = NULL, indices = NULL,
             integral.approximation = 'Haro', unbiased = FALSE, iterations = 5)
maxLikLocalDimEst(data, dnoise = NULL, sigma = 0, n = NULL,
       integral.approximation = 'Haro',
       unbiased = FALSE, iterations = 5)

Arguments

`data`	data set with each row describing a data point.
`k`	the number of distances that should be used for each dimension estimation.
`dnoise`	a function or a name of a function giving the translation density. If NULL, no noise is modeled, and the estimator turns into the Hill estimator (see References). Translation densities `dnoiseGaussH` and `dnoiseNcChi` are provided in the package. `dnoiseGaussH` is an approximation of `dnoiseNcChi`, but faster.
`sigma`	(estimated) standard deviation of the (isotropic) noise.
`n`	dimension of the noise.
`indices`	the indices of the data points for which local dimension estimation should be made.
`integral.approximation`	how to approximate the integrals in eq. (5) in Haro et al. (2008). Possible values: `'Haro'`, `'guaranteed.convergence'`, `'iteration'`. See Details.
`unbiased`	if `TRUE`, a factor `k-2` is used instead of the factor `k-1` that was used in Haro et al. (2008). This makes the estimator is unbiased in the case of data without noise or boundary.
`neighborhood.based`	if TRUE, dimension estimation is first made for neighborhoods around each data point and final value is aggregated from this. Otherwise dimension estimation is made once, based on distances in entire data set.
`neighborhood.aggregation`	if `neighborhood.based`, how should dimension estimates from different neighborhoods be combined. Possible values: `'maximum.liklihood'` follows Haro et al. (2008) in maximizing likelihood by using the harmonic mean, `'mean'` follows Levina and Bickel (2005) and takes the mean, `'robust'` takes the median, to remove influence from possible outliers.
`iterations`	for `integral.approxmation = 'iteration'`, how many iterations should be made.
`K`	for `neighborhood.based = FALSE`, how many distances for each data point should be considered when looking for the `k` shortest distances in the entire data set.

Details

The estimators are based on the referenced paper by Haro et al. (2008), using the assumption that there is a single manifold. The estimator in the paper is obtained using default parameters and dnoise = dnoiseGaussH.

With integral.approximation = 'Haro' the Taylor expansion approximation of r^(m-1) that Haro et al. (2008) used are employed. With integral.approximation = 'guaranteed.convergence', r is factored out and kept and r^(m-2) is approximated with the corresponding Taylor expansion. This guarantees convergence of the integrals. Divergence might be an issue when the noise is not sufficiently small in comparison to the smallest distances. With integral.approximation = 'iteration', five iterations is used to determine m.

maxLikLocalDimEst assumes that the data set is local i.e. a piece of a data set cut out by a sphere with a radius such that the data set is well approximated by a hyperplane (meaning that the curvature should be low in the local data set). See localIntrinsicDimension.

Value

For maxLikGlobalDimEst and maxLikLocalDimEst, a DimEst object with one slot:

dim.est

the dimension estimate

For maxLikPointwiseDimEst, a DimEstPointwise object, inheriting data.frame, with one slot:

dim.est

the dimension estimate for each data point. Row i has the local dimension estimate at point data[indices[i], ].

Author(s)

Kerstin Johnsson, Lund University.

References

Haro, G., Randall, G. and Sapiro, G. (2008) Translated Poisson Mixture Model for Stratification Learning. Int. J. Comput. Vis., 80, 358-374.

Hill, B. M. (1975) A simple general approach to inference about the tail of a distribution. Ann. Stat., 3(5) 1163-1174.

Levina, E. and Bickel., P. J. (2005) Maximum likelihood estimation of intrinsic dimension. Advances in Neural Information Processing Systems 17, 777-784. MIT Press.

Examples

data <- hyperBall(100, d = 7, n = 13, sd = 0.01)
maxLikGlobalDimEst(data, 10, dnoiseNcChi, 0.01, 13)
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13)
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13)
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13, neighborhood.aggregation = 'robust')
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13,
        integral.approximation = 'guaranteed.convergence',
        neighborhood.aggregation = 'robust')
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13,
        integral.approximation = 'iteration', unbiased = TRUE)

data <- hyperBall(1000, d = 7, n = 13, sd = 0.01)
maxLikGlobalDimEst(data, 500, dnoiseGaussH, 0.01, 13,
        neighborhood.based = FALSE)
maxLikGlobalDimEst(data, 500, dnoiseGaussH, 0.01, 13,
        integral.approximation = 'guaranteed.convergence',
        neighborhood.based = FALSE)
maxLikGlobalDimEst(data, 500, dnoiseGaussH, 0.01, 13,
        integral.approximation = 'iteration',
        neighborhood.based = FALSE)
        
data <- hyperBall(100, d = 7, n = 13, sd = 0.01)
maxLikPointwiseDimEst(data, 10, dnoiseNcChi, 0.01, 13, indices=1:10)

data <- cutHyperPlane(50, d = 7, n = 13, sd = 0.01)
maxLikLocalDimEst(data, dnoiseNcChi, 0.1, 3)
maxLikLocalDimEst(data, dnoiseGaussH, 0.1, 3)
maxLikLocalDimEst(data, dnoiseNcChi, 0.1, 3,
       integral.approximation = 'guaranteed.convergence')

[Package intrinsicDimension version 1.2.0 Index]