getJenksBreaks {BAMMtools}R Documentation

Jenks natural breaks classification

Description

Given a vector of numeric values and the number of desired breaks, calculate the optimum breakpoints using Jenks natural breaks optimization.

Usage

getJenksBreaks(var, k, subset = NULL)

Arguments

var

Numeric vector.

k

Number of breaks.

subset

Number of regularly spaced samples to subset from var. Intended to improve runtime for large datasets. If NULL, all values are used.

Details

getJenksBreaks is called by assignColorBreaks.

The values in var are binned into k+1 categories, according to the Jenks natural breaks classification method. This method is borrowed from the field of cartography, and seeks to minimize the variance within categories, while maximizing the variance between categories. If subset = NULL, all values of var are used for the optimization, however this can be a slow process with very large datasets. If subset is set to some number, then subset regularly spaced values of var will be sampled. This is slightly less accurate than when using the entirety of var but is unlikely to make much of a difference. If subset is defined but length(var) < subset, then subset has no effect.

The Jenks natural breaks method was ported to C from code found in the classInt R package.

Value

A numeric vector of intervals.

Author(s)

Pascal Title

See Also

See assignColorBreaks and plot.bammdata.

Examples

# load whales dataset
data(whales, events.whales)
ed <- getEventData(whales, events.whales, burnin=0.25, nsamples=500)

# for demonstration purposes, extract the vector of speciation rates
ed <- dtRates(ed, tau=0.01)
vec <- ed$dtrates$rates[[1]]

# Return breaks for the binning of speciation rates into 65 groups
# yielding 64 breaks
getJenksBreaks(vec, 64)

[Package BAMMtools version 2.1.11 Index]