new_p {pdqr} | R Documentation |
Create new pdqr-function
Description
Functions for creating new pdqr-functions based on numeric sample or data frame describing distribution. They construct appropriate "x_tbl" metadata based on the input and then create pdqr-function (of corresponding pdqr class) defined by that "x_tbl".
Usage
new_p(x, type, ...)
new_d(x, type, ...)
new_q(x, type, ...)
new_r(x, type, ...)
Arguments
x |
Numeric vector or data frame with appropriate columns (see "Data frame input" section). |
type |
Type of pdqr-function. Should be one of "discrete" or "continuous". |
... |
Extra arguments for density(). |
Details
Data frame input x
is treated as having enough information for
creating (including normalization of "y" column) an "x_tbl" metadata. For
more details see "Data frame input" section.
Numeric input is transformed into data frame which is then used as "x_tbl" metadata (for more details see "Numeric input" section):
If
type
is"discrete"
thenx
is viewed as sample from distribution that can produce only values fromx
. Input is tabulated and normalized to form "x_tbl" metadata.If
type
is"continuous"
then:If
x
has 1 element, output distribution represents a dirac-like distribution which is an approximation to singular dirac distribution.If
x
has more than 1 element, output distribution represents a density estimation with density() treatingx
as sample.
Value
A pdqr-function of corresponding class ("p" for
new_p()
, etc.) and type.
Numeric input
If x
is a numeric vector, it is transformed into a data frame which is then
used as "x_tbl" metadata to create pdqr-function of
corresponding class.
First, all NaN
, NA
, and infinite values are removed with warnings. If
there are no elements left, error is thrown. Then data frame is created in
the way which depends on the type
argument.
For "discrete" type elements of filtered x
are:
Rounded to 10th digit to avoid numerical representation issues (see Note in
==
's help page).Tabulated (all unique values are counted). Output data frame has three columns: "x" with unique values, "prob" with normalized (divided by sum) counts, "cumprob" with cumulative sum of "prob" column.
For "continuous" type output data frame has columns "x", "y", "cumprob".
Choice of algorithm depends on the number of x
elements:
If
x
has 1 element, an "x_tbl" metadata describes dirac-like "continuous" pdqr-function. It is implemented as triangular peak with center atx
's value and width of2e-8
(see Examples). This is an approximation of singular dirac distribution. Data frame has columns "x" with valuec(x-1e-8, x, x+1e-8)
, "y" with valuec(0, 1e8, 0)
normalized to have total integral of "x"-"y" points of 1, "cumprob"c(0, 0.5, 1)
.If
x
has more than 1 element, it serves as input to density(x, ...) for density estimation (here arguments in...
ofnew_*()
serve as extra arguments todensity()
). The output's "x" element is used as "x" column in output data frame. Column "y" is taken as "y" element ofdensity()
output, normalized so that piecewise-linear function passing through "x"-"y" points has total integral of 1. Column "cumprob" has cumulative probability of piecewise-linear d-function.
Data frame input
If x
is a data frame, it should have numeric columns appropriate for
"x_tbl" metadata of input type
: "x", "prob" for "discrete"
type
and "x", "y" for "continuous" type ("cumprob" column will be computed
inside new_*()
). To become an appropriate "x_tbl" metadata, input data
frame is ordered in increasing order of "x" column and then imputed in
the way which depends on the type
argument.
For "discrete" type:
Values in column "x" are rounded to 10th digit to avoid numerical representation issues (see Note in
==
's help page).If there are duplicate values in "x" column, they are "squashed" into one having sum of their probability in "prob" column.
Column "prob" is normalized by its sum to have total sum of 1.
Column "cumprob" is computed as cumulative sum of "prob" column.
For "continuous" type column "y" is normalized so that piecewise-linear function passing through "x"-"y" points has total integral of 1. Column "cumprob" has cumulative probability of piecewise-linear d-function.
Examples
set.seed(101)
x <- rnorm(10)
# Type "discrete": `x` values are directly tabulated
my_d_dis <- new_d(x, "discrete")
meta_x_tbl(my_d_dis)
plot(my_d_dis)
# Type "continuous": `x` serves as input to `density()`
my_d_con <- new_d(x, "continuous")
head(meta_x_tbl(my_d_con))
plot(my_d_con)
# Data frame input
## Values in "prob" column will be normalized automatically
my_p_dis <- new_p(data.frame(x = 1:4, prob = 1:4), "discrete")
## As are values in "y" column
my_p_con <- new_p(data.frame(x = 1:3, y = c(0, 10, 0)), "continuous")
# Using bigger bandwidth in `density()`
my_d_con_2 <- new_d(x, "continuous", adjust = 2)
plot(my_d_con, main = "Comparison of density bandwidths")
lines(my_d_con_2, col = "red")
# Dirac-like "continuous" pdqr-function is created if `x` is a single number
meta_x_tbl(new_d(1, "continuous"))