det.construct {detpack}R Documentation

Distribution Element Tree (DET) Construction

Description

The function det.construct generates a distribution element tree DET from available data. The DET can be used firstly in connection with det.query for density estimation. Secondly, with det.rnd, DETs can be used for smooth bootstrapping or more specifically conditional or unconditional random number generation.

Usage

det.construct(dta, mode = 2, lb = NA, ub = NA, alphag = 0.001,
  alphad = 0.001, progress = TRUE, dtalim = Inf, cores = 1)

Arguments

dta

matrix with d rows representing components or dimensions and n columns corresponding to data points or samples.

mode

order of distribution elements applied, default is mode = 2. Use +/-1 for constant or +/-2 for linear elements. mode > 0 and mode < 0 lead to equal-size and -score splits, respectively, in the element-refinement process.

lb, ub

vectors of length d with lower and upper sample-space bounds. If not provided or set to NA or 0, the bounds are determined from the data dta. If bounds are provided or given as 0, the data is not pre-whitened before the DET is computed.

alphag, alphad

significance levels for goodness-of-fit and independence tests, respectively, in element refinement or splitting process. Default is alphag = alphad = 1.0e-3. alphad is irrelevant for univariate data dta with d = 1.

progress

optional logical, if set to TRUE, a progress report about the DET construction process is provided.

dtalim

for large datasets, det.construct can be accelerated (with negligible impact on the resulting DET if dtalim is sufficiently large) by using only up to dtalim samples for element splitting tests. Setting dtalim < n impacts mainly the splitting at the tree root, with elements being large and thus containing many samples. Default is dtalim = Inf, which corresponds to using all available samples (no acceleration). When using dtalim < n, the samples have to be randomly arranged in dta: use for example dta[,sample(1:ncol(dta), ncol(dta), replace = FALSE)] to randomly rearrange the data.

cores

> 1 allows for parallel tree construction or branch splitting using the indicated number of cores. With cores = Inf, half of the available cores (see detectCores) are allocated. cores = 1 corresponds to serial tree construction (default).

Value

A DET object, which reflects the tree and pre-white transform, is returned.

References

Meyer, D.W. (2016) http://arxiv.org/abs/1610.00345 or Meyer, D.W., Statistics and Computing (2017) https://doi.org/10.1007/s11222-017-9751-9 and Meyer, D.W. (2017) http://arxiv.org/abs/1711.04632

Examples

## Gaussian mixture data
require(stats)
det <- det.construct(t(c(rnorm(1e5),rnorm(1e4)/100+2))) # default linear det (mode = 2)
x <- t(seq(-4,6,0.01)); p <- det.query(det, x); plot(x, p, type = "l")

## piecewise uniform data with peaks
x <- matrix(c(rep(0,1e3),rep(1,1e3), 2*runif(1e4),
              rep(0,5e2),rep(1,25e2),2*runif(9e3)), nrow = 2, byrow = TRUE)
det <- det.construct(x, mode = 1, lb = 0, ub = 0) # constant elements, no pre-whitening

[Package detpack version 1.1.3 Index]