get_hdr {ggdensity}R Documentation

Computing the highest density regions of a 2D density

Description

get_hdr is used to estimate a 2-dimensional density and compute corresponding HDRs. The estimated density and HDRs are represented in a discrete form as a grid, defined by arguments rangex, rangey, and n. get_hdr is used internally by layer functions stat_hdr(), stat_hdr_points(), stat_hdr_fun(), etc.

Usage

get_hdr(
  data = NULL,
  method = "kde",
  probs = c(0.99, 0.95, 0.8, 0.5),
  n = 100,
  rangex = NULL,
  rangey = NULL,
  hdr_membership = TRUE,
  fun,
  args = list()
)

Arguments

data

A data frame with columns x and y.

method

Either a character ("kde", "mvnorm", "histogram", "freqpoly", or "fun") or ⁠method_*()⁠ function. See the "The method argument" section below for details.

probs

Probabilities to compute HDRs for.

n

Resolution of grid representing estimated density and HDRs.

rangex, rangey

Range of grid representing estimated density and HDRs, along the x- and y-axes.

hdr_membership

Should HDR membership of data points (data) be computed? Defaults to TRUE, although it is computationally expensive for large data sets.

fun

Optional, a joint probability density function, must be vectorized in its first two arguments. See the "The fun argument" section below for details.

args

Optional, a list of arguments to be provided to fun.

Value

get_hdr returns a list with elements df_est (data.frame), breaks (named numeric), and data (data.frame).

The method argument

The density estimator used to estimate the HDRs is specified with the method argument. The simplest way to specify an estimator is to provide a character value to method, for example method = "kde" specifies a kernel density estimator. However, this specification is limited to the default behavior of the estimator.

Instead, it is possible to provide a function call, for example: method = method_kde(). In many cases, these functions accept parameters governing the density estimation procedure. Here, method_kde() accepts parameters h and adjust, both related to the kernel's bandwidth. For details, see ?method_kde. Every method of bivariate density estimation implemented has such corresponding ⁠method_*()⁠ function, each with an associated help page.

Note: geom_hdr() and other layer functions also have method arguments which behave in the same way. For more details on the use and implementation of the ⁠method_*()⁠ functions, see vignette("method", "ggdensity").

The fun argument

If method is set to "fun", get_hdr() expects a bivariate probability density function to be specified with the fun argument. It is required that fun be a function of at least two arguments (x and y). Beyond these first two arguments, fun can have arbitrarily many arguments; these can be set in get_hdr() as a named list via the args parameter.

Note: get_hdr() requires that fun be vectorized in x and y. For an example of an appropriate choice of fun, see the final example below.

Examples

df <- data.frame(x = rnorm(1e3), y = rnorm(1e3))

# Two ways to specify `method`
get_hdr(df, method = "kde")
get_hdr(df, method = method_kde())

## Not run: 

# If parenthesis are omitted, `get_hdr()` errors
get_hdr(df, method = method_kde)

## End(Not run)

# Estimate different HDRs with `probs`
get_hdr(df, method = method_kde(), probs = c(.975, .6, .2))

# Adjust estimator parameters with arguments to `method_kde()`
get_hdr(df, method = method_kde(h = 1))

# Parametric normal estimator of density
get_hdr(df, method = "mvnorm")
get_hdr(df, method = method_mvnorm())

# Compute "population" HDRs of specified bivariate pdf with `method = "fun"`
f <- function(x, y, sd_x = 1, sd_y = 1) dnorm(x, sd = sd_x) * dnorm(y, sd = sd_y)

get_hdr(
  method = "fun", fun = f,
  rangex = c(-5, 5), rangey = c(-5, 5)
 )

get_hdr(
  method = "fun", fun = f,
  rangex = c(-5, 5), rangey = c(-5, 5),
  args = list(sd_x = .5, sd_y = .5) # specify additional arguments w/ `args`
)


[Package ggdensity version 1.0.0 Index]