as_p {pdqr} | R Documentation |
Convert to pdqr-function
Description
Convert some function to be a proper pdqr-function of specific class, i.e. a function describing distribution with finite support and finite values of probability/density.
Usage
as_p(f, ...)
## Default S3 method:
as_p(f, support = NULL, ..., n_grid = 10001)
## S3 method for class 'pdqr'
as_p(f, ...)
as_d(f, ...)
## Default S3 method:
as_d(f, support = NULL, ..., n_grid = 10001)
## S3 method for class 'pdqr'
as_d(f, ...)
as_q(f, ...)
## Default S3 method:
as_q(f, support = NULL, ..., n_grid = 10001)
## S3 method for class 'pdqr'
as_q(f, ...)
as_r(f, ...)
## Default S3 method:
as_r(f, support = NULL, ..., n_grid = 10001,
n_sample = 10000, args_new = list())
## S3 method for class 'pdqr'
as_r(f, ...)
Arguments
f |
Appropriate function to be converted (see Details). |
... |
Extra arguments to |
support |
Numeric vector with two increasing elements describing desired
support of output. If |
n_grid |
Number of grid points at which |
n_sample |
Number of points to sample from |
args_new |
List of extra arguments for |
Details
General purpose of as_*()
functions is to create a proper
pdqr-function of desired class from input which doesn't satisfy these
conditions. Here is described sequence of steps which are taken to achieve
that goal.
If f
is already a pdqr-function, as_*()
functions properly update it
to have specific class. They take input's "x_tbl" metadata
and type to use with corresponding new_*()
function. For example, as_p(f)
in case of pdqr-function f
is essentially
the same as new_p(x = meta_x_tbl(f), type = meta_type(f))
.
If f
is a function describing "honored" distribution, it is detected
and output is created in predefined way taking into account extra arguments
in ...
. For more details see "Honored distributions" section.
If f
is some other unknown function, as_*()
functions use heuristics
for approximating input distribution with a "proper" pdqr-function. Outputs
of as_*()
can be only pdqr-functions of type "continuous" (because of
issues with support detection). It is assumed that f
returns values
appropriate for desired output class of as_*()
function and output type
"continuous". For example, input for as_p()
should return values of some
continuous cumulative distribution function (monotonically non-increasing
values from 0 to 1). To manually create function of type "discrete", supply
data frame input describing it to appropriate new_*()
function.
General algorithm of how as_*()
functions work for unknown function is as
follows:
-
Detect support. See "Support detection" section for more details.
-
Create data frame input for
new_*()
. The exact process differs:In
as_p()
equidistant grid ofn_grid
points is created inside detected support. After that, input's values at the grid is taken as reference points of cumulative distribution function used to approximate density at that same grid. This method showed to work more reliably in case density goes to infinity. That grid and density values are used as "x" and "y" columns of data frame input fornew_p()
.In
as_d()
"x" column of data frame is the same equidistant grid is taken as inas_p()
. "y" column is taken as input's values at this grid after possibly imputing infinity values. This imputation is done by taking maximum from left and right linear extrapolations on mentioned grid.In
as_q()
, at first inverse of inputf
function is computed on [0; 1] interval. It is done by approximating it with piecewise-linear function on [0; 1] equidistant grid withn_grid
points, imputing infinity values (which ensures finite support), and computing inverse of approximation. This inverse off
is used to create data frame input withas_p()
.In
as_r()
at first d-function withnew_d()
is created based on the same sample used for support detection and extra arguments supplied as list inargs_new
argument. In other words, density estimation is done based on sample, generated from inputf
. After that, its values are used to create data frame withas_d()
.
-
Use appropriate
new_*()
function with data frame from previous step andtype = "continuous"
. This step implies that all tails outside detected support are trimmed and data frame is normalized to represent proper piecewise-linear density.
Value
A pdqr-function of corresponding class.
Honored distributions
For efficient workflow, some commonly used distributions are recognized as
special ("honored"). Those receive different treatment in as_*()
functions.
Basically, there is a manually selected list of "honored" distributions with all their information enough to detect them. Currently that list has all common univariate distributions from 'stats' package, i.e. all except multinomial and "less common distributions of test statistics".
"Honored" distribution is recognized only if f
is one of p*()
, d*()
,
q*()
, or r*()
function describing honored distribution and is supplied as
variable with original name. For example, as_d(dunif)
will be treated as
"honored" distribution but as_d(function(x) {dunif(x)})
will not.
After it is recognized that input f
represents "honored" distribution,
its support is computed based on predefined rules. Those take into
account special features of distribution (like infinite support or infinite
density values) and supplied extra arguments in ...
. Usually output support
"loses" only around 1e-6
probability on each infinite tail.
After that, for "discrete" type output new_d()
is used for appropriate data
frame input and for "continuous" - as_d()
with appropriate d*()
function
and support. D-function is then converted to desired class with as_*()
.
Support detection
In case input is a function without any extra information, as_*()
functions
must know which finite support its output should have. User can supply
desired support directly with support
argument, which can also be NULL
(mean automatic detection of both edges) or have NA
to detect only those
edges.
Support is detected in order to preserve as much information as practically reasonable. Exact methods differ:
In
as_p()
support is detected as values at which input function is equal to1e-6
(left edge detection) and1 - 1e-6
(right edge), which means "losing"1e-6
probability on each tail. Note that those values are searched inside [-10^100; 10^100] interval.In
as_d()
, at first an attempt at finding one point of non-zero density is made by probing 10000 points spread across wide range of real line (approximately from-1e7
to1e7
). If input's value at all of them is zero, error is thrown. After finding such point, cumulative distribution function is made by integrating input with integrate() using found point as reference (without this there will be poor accuracy ofintegrate()
). Created CDF function is used to find1e-6
and1 - 1e-6
quantiles as inas_p()
, which serve as detected support.In
as_q()
quantiles for 0 and 1 are probed for being infinite. If they are,1e-6
and1 - 1e-6
quantiles are used respectively instead of infinite values to form detected support.In
as_r()
sample of sizen_sample
is generated and detected support is its range stretched by mean difference of sorted points (to account for possible tails at which points were not generated). Note that this means that original inputf
"demonstrates its randomness" only once insideas_r()
, with output then used for approximation of "original randomness".
See Also
pdqr_approx_error()
for computing approximation errors compared to
some reference function (usually input to as_*()
family).
Examples
# Convert existing "proper" pdqr-function
set.seed(101)
x <- rnorm(10)
my_d <- new_d(x, "continuous")
my_p <- as_p(my_d)
# Convert "honored" function to be a proper pdqr-function. To use this
# option, supply originally named function.
p_unif <- as_p(punif)
r_beta <- as_r(rbeta, shape1 = 2, shape2 = 2)
d_pois <- as_d(dpois, lambda = 5)
## `pdqr_approx_error()` computes pdqr approximation error
summary(pdqr_approx_error(as_d(dnorm), dnorm))
## This will work as if input is unkonw function because of unsupported
## variable name
my_runif <- function(n) {
runif(n)
}
r_unif_2 <- as_r(my_runif)
plot(as_d(r_unif_2))
# Convert some other function to be a "proper" pdqr-function
my_d_quadr <- as_d(function(x) {
0.75 * (1 - x^2)
}, support = c(-1, 1))
# Support detection
unknown <- function(x) {
dnorm(x, mean = 1)
}
## Completely automatic support detection
as_d(unknown)
## Semi-automatic support detection
as_d(unknown, support = c(-4, NA))
as_d(unknown, support = c(NA, 5))
## If support is very small and very distant from zero, it probably won't
## get detected in `as_d()` (throwing a relevant error)
## Not run:
as_d(function(x) {
dnorm(x, mean = 10000, sd = 0.1)
})
## End(Not run)
# Using different level of granularity
as_d(unknown, n_grid = 1001)