psden {evmix} | R Documentation |
P-Splines probability density function
Description
Density, cumulative distribution function, quantile function and random number generation for the P-splines density estimate. B-spline coefficients can be result from Poisson regression with log or identity link.
Usage
dpsden(x, beta = NULL, nbinwidth = NULL, xrange = NULL, nseg = 10,
degree = 3, design.knots = NULL, log = FALSE)
ppsden(q, beta = NULL, nbinwidth = NULL, xrange = NULL, nseg = 10,
degree = 3, design.knots = NULL, lower.tail = TRUE)
qpsden(p, beta = NULL, nbinwidth = NULL, xrange = NULL, nseg = 10,
degree = 3, design.knots = NULL, lower.tail = TRUE)
rpsden(n = 1, beta = NULL, nbinwidth = NULL, xrange = NULL,
nseg = 10, degree = 3, design.knots = NULL)
Arguments
x |
quantiles |
beta |
vector of B-spline coefficients (required) |
nbinwidth |
scaling to convert count frequency into proper density |
xrange |
vector of minimum and maximum of B-spline (support of density) |
nseg |
number of segments between knots |
degree |
degree of B-splines (0 is constant, 1 is linear, etc.) |
design.knots |
spline knots for splineDesign function |
log |
logical, if TRUE then log density |
q |
quantiles |
lower.tail |
logical, if FALSE then upper tail probabilities |
p |
cumulative probabilities |
n |
sample size (positive integer) |
Details
P-spline density estimate using B-splines with given coefficients. B-splines
knots can be specified using design.knots
or regularly spaced knots can be specified
using xrange
, nseg
and deg
. No default knots are provided.
If regularly spaced knots are specified using xrange
, nseg
and deg
,
then B-splines which are shifted/spliced versions of each other are defined (i.e. not natural B-splines)
which is consistent with definition of Eilers and Marx, the masters of P-splines.
The splineDesign
function is used to calculate the B-splines, which
intakes knot locations as design.knots
. As such the design.knots
are not the knots in
their usual sense (e.g. to cover [0, 100] with 10 segments the usual knots would be 0, 10, \ldots, 100
).
The design.knots
must be extended by the degree
, so for degree = 2
the
design.knots = seq(-20, 120, 10)
.
Further, if the user wants natural B-splines then these can be specified using the
design.knots
, with replicated knots at each bounday according to the degree. To continue the
above example, for degree = 2
the design.knots = c(rep(0, 2), seq(0, 100, 10), rep(100, 2))
.
If both the design.knots
and other knot specification are provided, then the former are
used by default. Default values for only the degree
and nseg
are provided, all the other
P-spline inputs must be provided. Notice that the order
and lambda
penalty are not needed
as these are encapsulated in the inference for the B-spline coefficients.
Poisson regression is typically used for estimating the B-spline coefficients, using maximum likelihood
estimation (via iterative re-weighted least squares). A log-link function is usually used and as such the
beta
coefficients are on a log-scale, and the density needs to be exponentiated. However, an
identity link may be (carefully) used and then these coefficients are on the usual scale.
The beta
coefficients are estimated using a particular sample (size) and histogram bin-width, using
Poisson regression. Thus to
convert the predicted counts into a proper density it needs to be rescaled by dividing by n * binwidth
.
If nbinwidth=NULL
is not provided then a crude approximate scaling is used by normalising the density
to be proper. The renormalisation requires numerical integration, which is
computationally intensive and so best avoided wherever possible.
Checks of the consistency of the xrange
, degree
and nseg
and design.knots
are made,
with the values implied by the design.knots
used by default to replace any incorrect values. These
replacements are made for notational efficiency for users.
An inversion sampler is used for random number generation which also rather inefficient, as it could be carried out more efficiently using a mixture representation.
The quantile function is rather complicated as there is no closed form solution,
so is obtained by numerical approximation of the inverse cumulative distribution function
P(X \le q) = p
to find q
. The quantile function
qpsden
evaluates the P-splines cumulative distribution
function over the xrange
. A sequence of values
of length fifty times the number of knots (with a minimum of 1000) is first
calculated. Spline based interpolation using splinefun
,
with default monoH.FC
method, is then used to approximate the quantile
function. This is a similar approach to that taken
by Matt Wand in the qkde
in the ks
package.
Value
dpsden
gives the density,
ppsden
gives the cumulative distribution function,
qpsden
gives the quantile function and
rpsden
gives a random sample.
Note
Unlike most of the other extreme value mixture model functions the
psden
functions have not been vectorised as
this is not appropriate. The main inputs (x
, p
or q
)
must be either a scalar or a vector, which also define the output length.
Default values are provided for P-spline inputs of degree
and nseg
only,
but all others must be provided by the user.
The default sample size for rpsden
is 1.
Missing (NA
) and Not-a-Number (NaN
) values in x
,
p
and q
are passed through as is and infinite values are set to
NA
. None of these are not permitted for the parameters.
Error checking of the inputs (e.g. invalid probabilities) is carried out and will either stop or give warning message as appropriate.
Author(s)
Alfadino Akbar and Carl Scarrott carl.scarrott@canterbury.ac.nz.
References
http://en.wikipedia.org/wiki/B-spline
http://statweb.lsu.edu/faculty/marx/
Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science 11(2), 89-121.
See Also
Other psden: fpsdengpd
, fpsden
,
psdengpd
Other psdengpd: fpsdengpd
,
psdengpd
Other fpsden: fpsden
Examples
## Not run:
set.seed(1)
par(mfrow = c(1, 1))
x = rnorm(1000)
xx = seq(-6, 6, 0.01)
y = dnorm(xx)
# Plenty of histogram bins (100)
breaks = seq(-4, 4, length.out=101)
# P-spline fitting with cubic B-splines, 2nd order penalty and 8 internal segments
# CV search for penalty coefficient.
fit = fpsden(x, lambdaseq = 10^seq(-5, 5, 0.25), breaks = breaks,
xrange = c(-4, 4), nseg = 10, degree = 3, ord = 2)
psdensity = exp(fit$bsplines %*% fit$mle)
hist(x, freq = FALSE, breaks = seq(-4, 4, length.out=101), xlim = c(-6, 6))
lines(xx, y, col = "black") # true density
# P-splines density from dpsden function
with(fit, lines(xx, dpsden(xx, beta, nbinwidth, design = design.knots), lwd = 2, col = "blue"))
legend("topright", c("True Density","P-spline density"), col=c("black", "blue"), lty = 1)
# plot B-splines
par(mfrow = c(2, 1))
with(fit, matplot(mids, as.matrix(bsplines), type = "l", lty = 1))
# Natural B-splines
knots = with(fit, seq(xrange[1], xrange[2], length.out = nseg + 1))
natural.knots = with(fit, c(rep(xrange[1], degree), knots, rep(xrange[2], degree)))
naturalb = splineDesign(natural.knots, fit$mids, ord = fit$degree + 1, outer.ok = TRUE)
with(fit, matplot(mids, naturalb, type = "l", lty = 1))
# Compare knot specifications
rbind(fit$design.knots, natural.knots)
# User can use natural B-splines if design.knots are specified manually
natural.fit = fpsden(x, lambdaseq = 10^seq(-5, 5, 0.25), breaks = breaks,
design.knots = natural.knots, nseg = 10, degree = 3, ord = 2)
psdensity = with(natural.fit, exp(bsplines %*% mle))
par(mfrow = c(1, 1))
hist(x, freq = FALSE, breaks = seq(-4, 4, length.out=101), xlim = c(-6, 6))
lines(xx, y, col = "black") # true density
# check density against dpsden function
with(fit, lines(xx, dpsden(xx, beta, nbinwidth, design = design.knots), lwd = 2, col = "blue"))
with(natural.fit, lines(xx, dpsden(xx, beta, nbinwidth, design = design.knots),
lwd = 2, col = "red", lty = 2))
legend("topright", c("True Density", "Eilers and Marx B-splines", "Natural B-splines"),
col=c("black", "blue", "red"), lty = c(1, 1, 2))
## End(Not run)