R: 'TWO-NN' estimator

twonn {intRinsic}

R Documentation

`TWO-NN` estimator

Description

The function can fit the two-nearest neighbor estimator within the maximum likelihood and the Bayesian frameworks. Also, one can obtain the estimates using least squares estimation, depending on the specification of the argument method. This model has been originally presented in Facco et al., 2017 . See also Denti et al., 2022 for more details.

Usage

twonn(
  X = NULL,
  dist_mat = NULL,
  mus = NULL,
  method = c("mle", "linfit", "bayes"),
  alpha = 0.95,
  c_trimmed = 0.01,
  unbiased = TRUE,
  a_d = 0.001,
  b_d = 0.001,
  ...
)

## S3 method for class 'twonn_bayes'
print(x, ...)

## S3 method for class 'twonn_bayes'
summary(object, ...)

## S3 method for class 'summary.twonn_bayes'
print(x, ...)

## S3 method for class 'twonn_bayes'
plot(x, plot_low = 0.001, plot_upp = NULL, by = 0.05, ...)

## S3 method for class 'twonn_linfit'
print(x, ...)

## S3 method for class 'twonn_linfit'
summary(object, ...)

## S3 method for class 'summary.twonn_linfit'
print(x, ...)

## S3 method for class 'twonn_linfit'
plot(x, ...)

## S3 method for class 'twonn_mle'
print(x, ...)

## S3 method for class 'twonn_mle'
summary(object, ...)

## S3 method for class 'summary.twonn_mle'
print(x, ...)

## S3 method for class 'twonn_mle'
plot(x, ...)

Arguments

`X`	data matrix with `n` observations and `D` variables.
`dist_mat`	distance matrix computed between the `n` observations.
`mus`	vector of second to first NN distance ratios.
`method`	chosen estimation method. It can be `"mle"` for maximum likelihood estimator; `"linfit"` for estimation via the least squares approach; `"bayes"` for estimation with the Bayesian approach.
`alpha`	the confidence level (for `mle` and least squares fit) or posterior probability in the credible interval (`bayes`).
`c_trimmed`	the proportion of trimmed observations.
`unbiased`	logical, applicable when `method = "mle"`. If `TRUE`, the MLE is corrected to ensure unbiasedness.
`a_d`	shape parameter of the Gamma prior on the parameter `d`, applicable when `method = "bayes"`.
`b_d`	rate parameter of the Gamma prior on the parameter `d`, applicable when `method = "bayes"`.
`...`	ignored.
`x`	object of class `twonn_mle`, the output of the `twonn` function when `method = "mle"`.
`object`	object of class `twonn_mle`, obtained from the function `twonn_mle()`.
`plot_low`	lower bound of the interval on which the posterior density is plotted.
`plot_upp`	upper bound of the interval on which the posterior density is plotted.
`by`	step-size at which the sequence spanning the interval is incremented.

Value

list characterized by a class type that depends on the method chosen. Regardless of the method, the output list always contains the object est, which provides the estimated intrinsic dimension along with uncertainty quantification. The remaining objects vary with the estimation method. In particular, if

method = "mle": the output reports the MLE and the relative confidence interval;
method = "linfit": the output includes the lm() object used for the computation;
method = "bayes": the output contains the (1 + alpha) / 2 and (1 - alpha) / 2 quantiles, mean, mode, and median of the posterior distribution of d.

References

Facco E, D'Errico M, Rodriguez A, Laio A (2017). "Estimating the intrinsic dimension of datasets by a minimal neighborhood information." Scientific Reports, 7(1). ISSN 20452322, doi:10.1038/s41598-017-11873-y.

Denti F, Doimo D, Laio A, Mira A (2022). "The generalized ratios intrinsic dimension estimator." Scientific Reports, 12(20005). ISSN 20452322, doi:10.1038/s41598-022-20991-1.

Examples

# dataset with 1000 observations and id = 2
X <- replicate(2,rnorm(1000))
twonn(X)
# dataset with 1000 observations and id = 3
Y <- replicate(3,runif(1000))
#  Bayesian and least squares estimate from distance matrix
dm <- as.matrix(dist(Y,method = "manhattan"))
twonn(dist_mat = dm,method = "bayes")
twonn(dist_mat = dm,method = "linfit")

[Package intRinsic version 1.0.2 Index]