R: Model-based mixture density estimation for bounded data

densityMclustBounded {mclustAddons}

R Documentation

Model-based mixture density estimation for bounded data

Description

Density estimation for bounded data via transformation-based approach for Gaussian mixtures.

Usage

densityMclustBounded(data, 
                     G = NULL, modelNames = NULL,
                     lbound = NULL, 
                     ubound = NULL, 
                     lambda = c(-3, 3),
                     prior = NULL,
                     noise = NULL,
                     nstart = 25,
                     parallel = FALSE,
                     seed = NULL,
                     ...)

## S3 method for class 'densityMclustBounded'
print(x, digits = getOption("digits"), ...)

## S3 method for class 'densityMclustBounded'
summary(object, parameters = FALSE, classification = FALSE, ...)

Arguments

`data`	A numeric vector, matrix, or data frame of observations. If a matrix or data frame, rows correspond to observations and columns correspond to variables.
`G`	An integer vector specifying the numbers of mixture components. By default `G=1:3`.
`modelNames`	A vector of character strings indicating the Gaussian mixture models to be fitted on the transformed-data space. See `mclustModelNames` for a descripton of available models.
`lbound`	Numeric vector proving lower bounds for variables.
`ubound`	Numeric vector proving upper bounds for variables.
`lambda`	A numeric vector providing the range of searched values for the transformation parameter(s).
`prior`	A function specifying a prior for Bayesian regularization of Gaussian mixtures. See `priorControl` for details.
`noise`	A specification for the noise component. Currently, not available.
`nstart`	An integer value specifying the number of replications of k-means clustering to be used for initializing the EM algorithm. See kmeans.
`parallel`	An optional argument which allows to specify if the search over all possible models should be run sequentially (default) or in parallel. For a single machine with multiple cores, possible values are: a logical value specifying if parallel computing should be used (`TRUE`) or not (`FALSE`, default) for evaluating the fitness function; a numerical value which gives the number of cores to employ. By default, this is obtained from the function `detectCores`; a character string specifying the type of parallelisation to use. This depends on system OS: on Windows OS only `"snow"` type functionality is available, while on Unix/Linux/Mac OSX both `"snow"` and `"multicore"` (default) functionalities are available. In all the cases described above, at the end of the search the cluster is automatically stopped by shutting down the workers. If a cluster of multiple machines is available, evaluation of the fitness function can be executed in parallel using all, or a subset of, the cores available to the machines belonging to the cluster. However, this option requires more work from the user, who needs to set up and register a parallel back end. In this case the cluster must be explicitely stopped with `stopCluster`.
`seed`	An integer value containing the random number generator state. This argument can be used to replicate the result of k-means initialisation strategy. Note that if parallel computing is required, the doRNG package must be installed.
`x`, `object`	An object of class `"densityMclustBounded"`.
`digits`	The number of significant digits to use for printing.
`parameters`	Logical; if `TRUE`, the parameters of mixture components are printed.
`classification`	Logical; if `TRUE`, the MAP classification/clustering of observations is printed.
`...`	Further arguments passed to or from other methods.

Value

Returns an object of class "densityMclustBounded".

Author(s)

Luca Scrucca

References

Scrucca L. (2019) A transformation-based approach to Gaussian mixture density estimation for bounded data. Biometrical Journal, 61:4, 873–888. https://doi.org/10.1002/bimj.201800174

Examples


# univariate case with lower bound
x <- rchisq(200, 3)
xgrid <- seq(-2, max(x), length=1000)
f <- dchisq(xgrid, 3)  # true density
dens <- densityMclustBounded(x, lbound = 0)
summary(dens)
summary(dens, parameters = TRUE)
plot(dens, what = "BIC")
plot(dens, what = "density")
lines(xgrid, f, lty = 2)
plot(dens, what = "density", data = x, breaks = 15)

# univariate case with lower & upper bounds
x <- rbeta(200, 5, 1.5)
xgrid <- seq(-0.1, 1.1, length=1000)
f <- dbeta(xgrid, 5, 1.5)  # true density
dens <- densityMclustBounded(x, lbound = 0, ubound = 1)
summary(dens)
plot(dens, what = "BIC")
plot(dens, what = "density")
plot(dens, what = "density", data = x, breaks = 9)

# bivariate case with lower bounds
x1 <- rchisq(200, 3)
x2 <- 0.5*x1 + sqrt(1-0.5^2)*rchisq(200, 5)
x <- cbind(x1, x2)
plot(x)
dens <- densityMclustBounded(x, lbound = c(0,0))
summary(dens, parameters = TRUE)
plot(dens, what = "BIC")
plot(dens, what = "density")
plot(dens, what = "density", type = "hdr")
plot(dens, what = "density", type = "persp")

[Package mclustAddons version 0.8 Index]