select_functions {rassta} | R Documentation |
Select Constrained Univariate Distribution Functions
Description
Selection of distribution functions for continuous raster layers that were
used to create a raster layer of classification units. The distribution
functions currently supported are the probability density function (PDF), the
empirical cumulative density function (ECDF), and the inverse of the
empirical cumulative density function (iECDF). Please note that
select_functions
DOES NOT calculate the aforementioned
distribution functions. The sole purpose of select_functions
is
to assist in the knowledge-driven selection of the most appropriate
distribution function for each continuous variable used to create a given
classification unit (see Details).
Usage
select_functions(
cu.rast,
var.rast,
fun = mean,
varscale = "uniminmax",
mode = "auto",
verbose = TRUE,
...
)
Arguments
cu.rast |
SpatRaster, as in |
var.rast |
SpatRaster. Multi-layer SpatRaster containing the n continuous raster layers of the variables used to create the classification units. |
fun |
Character. Descriptive statistical measurement (e.g., mean, max).
See |
varscale |
Character. Variable scaling method. See scale argument
in |
mode |
Character. String specifying the selection mode for univariate distribution functions. Possible values are "inter" for interactive selection, and "auto" for automatic selection (see Details). Default: "auto" |
verbose |
Boolean. Show warning messages in the console? Default: FALSE |
... |
Additional arguments as for |
Details
The selection of distribution functions is univariate, that is, for each variable, and it is constrained, meaning that the selection has to be made for each classification unit. Overall, the distribution functions are used to characterize typical values of a given continuous variable within a given classification unit. When the PDF is selected, values closer to, or at the peak of the PDF will be considered as the most typical. Contrarily, values at the tails of the PDF will be considered as the less typical. When the ECDF or the iECDF are selected, values toward (+)infinity and (-)infinity will be considered as the most typical values, respectively.
In order to assist the selection process, when mode = "inter", this
function displays an interactive parallel coordinates plot (see
ggplotly
) and a writable table (built in Shiny). For
each variable, the parallel coordinates plot shows a trend of a descriptive
statistical measurement (argument fun) across all of the
classification units. Using this trend, one can then select the most
appropriate distribution function for each variable based on the
occurrence/absence of "peaks" and "pits" in the observed
trend. For instance, a peak (highest point in the trend) would indicate that
the given classification unit contains on average, the highest values of that
variable. Conversely, a pit (lowest point in the trend) would indicate that
the given classification unit contains on average, the lowest values of that
variable. Thus, an ECDF and an iECDF can be selected for the peak and the
pit, respectively. The PDF can be selected for classification units whose
trend does not show either a peak or a pit. Please consider that peaks and
pits are only reference points and thus, one should validate the selection of
distribution functions based on domain knowledge.
When mode = "auto", the criteria for the selection of distribution functions will be based on peaks and pits in the parallel coordinates plot.
The output table (distfun) is intended to be used as input in the
predict_functions
function.
The selection of distribution functions is similar to the selection of membership functions in fuzzy logic. For example, if one wants to describe a phenomenon through distribution functions of continuous variables, then the functions can be considered to be membership curves. Accordingly, the PDF, ECDF, and iECDF will be equivalent to the Gaussian, S, and Z membership functions, respectively.
Value
If mode = "inter":
distfun: A DT table (DataTables library) with the following attributes: (1) Class.Unit = numeric ID for classification units, (2) Variable = each of the n continuous raster layers of a classification unit, and (3) Dist.Func = Empty column whose cells can be filled with the following strings: "PDF, "ECDF", and "iECDF" (unquoted). This table can be saved on disk through the Shiny interface.
parcoord: A plotly-based parallel coordinate plot which can be saved on disk using the R package htmlwidgets.
If mode = "auto":
distfun: Same as distfun when mode = "inter", except for column "Dist.Func" whose cells were automatically filled.
parcoord: Same as parcoord when mode = "inter".
See Also
Other Landscape Correspondence Metrics:
predict_functions()
,
signature
,
similarity()
Examples
require(terra)
p <- system.file("exdat", package = "rassta")
# Multi-layer SpatRaster of topographic variables
## 3 topographic variables
tf <- list.files(path = p, pattern = "^height|^slope|^wetness",
full.names = TRUE
)
tvars <- terra::rast(tf)
# Single-layer SpatRaster of topographic classification units
## 5 classification units
tcf <- list.files(path = p, pattern = "topography.tif", full.names = TRUE)
tcu <- terra::rast(tcf)
# Automatic selection of distribution functions
tdif <- select_functions(cu.rast = tcu, var.rast = tvars, fun = mean)
# Parallel coordinates plot
if(interactive()){tdif$parcoord}