tidyst_kde {eks}R Documentation

Tidy and geospatial kernel density estimates

Description

Tidy and geospatial versions of kernel density estimates for 1- and 2-dimensional data.

Usage

tidy_kde(data, ...)
st_kde(x, ...)

Arguments

data

data frame/tibble of data values

x

sf object with point geometry

...

other parameters in ks::kde function

Details

For tidy_kde, the first columns of the output tibble are copied from aes(x) (1-d) or aes(x,y) (2-d). These columns are the evaluation grid points. The estimate column is the kernel density values at these grid points. The group column is a copy of the grouping variable of the input data. The ks column is a copy of the untidy kernel estimate from ks::kde, since the calculations for the layer functions geom_contour_ks, geom_contour_filled_ks require both the observations data and the kernel estimate as a kde object. For this reason, it is advised to compute a tidy kernel estimate first and then to create a ggplot with this tidy kernel estimate as the default data in the layer.

For st_kde, the output list contains the field tidy_ks which is the output from tidy_ks. The field grid is the kernel estimate values, with rectangular polygons. The field sf is the 1% to 99% probability contour regions as multipolygons, with the derived attribute contlabel = 100%-cont.

The structure of the tidy_kde output is inherited from the input, i.e. if the input is a data frame/ (grouped) tibble then the output is a data frame/(grouped) tibble. Likewise for the sf object outputs for st_kde.

The default bandwidth matrix is the unconstrained plug-in selector ks::Hpi, which is suitable for a wide range of data sets, since it is not restrained to smoothing along the coordinate axes. This produces a kernel estimate which is more representative of the data than with the default bandwidth in geom_density_2d and geom_density_2d_filled. For further details of the computation of the kernel density estimate and the bandwidth selector procedure, see ?ks::kde.

Value

–For tidy_kde, the output is an object of class tidy_ks, which is a tibble with columns:

x

evaluation points in x-axis (name is taken from 1st input variable in data)

y

evaluation points in y-axis (2-d) (name is taken from 2nd input variable in data)

estimate

kernel estimate values

ks

first row (within each group) contains the untidy kernel estimate from ks::kde

tks

short object class label derived from the ks object class

label

long object class label

group

grouping variable (if grouped input) (name is taken from grouping variable in data).

–For st_kde, the output is an object of class st_ks, which is a list with fields:

tidy_ks

tibble of simplified output (ks, tks, label, group) from tidy_kde

grid

sf object of grid of kernel density estimate values, as polygons, with attributes estimate, group copied from the tidy_ks object

sf

sf object of 1% to 99% contour regions of kernel density estimate, as multipolygons, with attributes contlabel derived from the contour level; and estimate, group copied from the tidy_ks object.

Examples

## tidy density estimates
data(crabs, package="MASS")
## tidy 1-d density estimate per species
crabs1 <- dplyr::select(crabs, FL, sp)
crabs1 <- dplyr::group_by(crabs1, sp)
t1 <- tidy_kde(crabs1)
gt1 <- ggplot2::ggplot(t1, ggplot2::aes(x=FL)) 
gt1 + ggplot2::geom_line(colour=1) + geom_rug_ks(colour=4) +
    ggplot2::facet_wrap(~sp)

## tidy 2-d density estimate
## suitable smoothing matrix gives bimodal estimate
crabs2 <- dplyr::select(crabs, FL, CW)
t2 <- tidy_kde(crabs2)
gt2 <- ggplot2::ggplot(t2, ggplot2::aes(x=FL, y=CW)) 
gt2 + geom_contour_filled_ks(colour=1) + 
    colorspace::scale_fill_discrete_sequential()

## default smoothing matrix in geom_density_2d_filled() gives unimodal estimate
gt3 <- ggplot2::ggplot(crabs2, ggplot2::aes(x=FL, y=CW)) 
gt3 + ggplot2::geom_density_2d_filled(bins=4, colour=1) +
    colorspace::scale_fill_discrete_sequential() +
    ggplot2::guides(fill=ggplot2::guide_legend(title="Density", reverse=TRUE))

## facet wrapped geom_sf plot with fixed contour levels for all facets
crabs3 <- dplyr::select(crabs, FL, CW, sex)
t3 <- tidy_kde(dplyr::group_by(crabs3, sex))
b <- contour_breaks(t3)
gt3 <- ggplot2::ggplot(t3, ggplot2::aes(x=FL, y=CW)) 
gt3 + geom_contour_filled_ks(colour=1, breaks=b) + 
    colorspace::scale_fill_discrete_sequential() +
    ggplot2::facet_wrap(~sex)

## geospatial density estimate
data(wa)
data(grevilleasf)
hakeoides <- dplyr::filter(grevilleasf, species=="hakeoides")
hakeoides_coord <- data.frame(sf::st_coordinates(hakeoides))
s1 <- st_kde(hakeoides)

## base R plot
xlim <- c(1.2e5, 1.1e6); ylim <- c(6.1e6, 7.2e6)
plot(wa, xlim=xlim, ylim=ylim)
plot(s1, add=TRUE)

## geom_sf plot
## suitable smoothing matrix gives optimally smoothed contours
gs1 <- ggplot2::ggplot(s1) + ggplot2::geom_sf(data=wa, fill=NA) + 
    ggthemes::theme_map()
gs1 + ggplot2::geom_sf(data=st_get_contour(s1), 
    ggplot2::aes(fill=label_percent(contlabel))) +
    colorspace::scale_fill_discrete_sequential(palette="Heat2") +
    ggplot2::coord_sf(xlim=xlim, ylim=ylim) 

## default smoothing matrix in geom_density_2d_filled() is oversmoothed
gs2 <- ggplot2::ggplot(hakeoides_coord) + ggplot2::geom_sf(data=wa, fill=NA) + 
    ggthemes::theme_map()
gs2 + ggplot2::geom_density_2d_filled(ggplot2::aes(x=X, y=Y), bins=4, colour=1) +
    colorspace::scale_fill_discrete_sequential(palette="Heat2") +
    ggplot2::guides(fill=ggplot2::guide_legend(title="Density", reverse=TRUE)) +
    ggplot2::coord_sf(xlim=xlim, ylim=ylim) 

## Not run: ## export as geopackage for external GIS software
sf::write_sf(wa, dsn="grevillea.gpkg", layer="wa")
sf::write_sf(hakeoides, dsn="grevillea.gpkg", layer="hakeoides")
sf::write_sf(gs1_cont, dsn="grevillea.gpkg", layer="hakeoides_cont")
sf::write_sf(s1$grid, dsn="grevillea.gpkg", layer="hakeoides_grid")
## End(Not run)

[Package eks version 1.0.4 Index]