psden {evmix}R Documentation

P-Splines probability density function

Description

Density, cumulative distribution function, quantile function and random number generation for the P-splines density estimate. B-spline coefficients can be result from Poisson regression with log or identity link.

Usage

dpsden(x, beta = NULL, nbinwidth = NULL, xrange = NULL, nseg = 10,
  degree = 3, design.knots = NULL, log = FALSE)

ppsden(q, beta = NULL, nbinwidth = NULL, xrange = NULL, nseg = 10,
  degree = 3, design.knots = NULL, lower.tail = TRUE)

qpsden(p, beta = NULL, nbinwidth = NULL, xrange = NULL, nseg = 10,
  degree = 3, design.knots = NULL, lower.tail = TRUE)

rpsden(n = 1, beta = NULL, nbinwidth = NULL, xrange = NULL,
  nseg = 10, degree = 3, design.knots = NULL)

Arguments

x

quantiles

beta

vector of B-spline coefficients (required)

nbinwidth

scaling to convert count frequency into proper density

xrange

vector of minimum and maximum of B-spline (support of density)

nseg

number of segments between knots

degree

degree of B-splines (0 is constant, 1 is linear, etc.)

design.knots

spline knots for splineDesign function

log

logical, if TRUE then log density

q

quantiles

lower.tail

logical, if FALSE then upper tail probabilities

p

cumulative probabilities

n

sample size (positive integer)

Details

P-spline density estimate using B-splines with given coefficients. B-splines knots can be specified using design.knots or regularly spaced knots can be specified using xrange, nseg and deg. No default knots are provided.

If regularly spaced knots are specified using xrange, nseg and deg, then B-splines which are shifted/spliced versions of each other are defined (i.e. not natural B-splines) which is consistent with definition of Eilers and Marx, the masters of P-splines.

The splineDesign function is used to calculate the B-splines, which intakes knot locations as design.knots. As such the design.knots are not the knots in their usual sense (e.g. to cover [0, 100] with 10 segments the usual knots would be 0, 10, \ldots, 100). The design.knots must be extended by the degree, so for degree = 2 the design.knots = seq(-20, 120, 10).

Further, if the user wants natural B-splines then these can be specified using the design.knots, with replicated knots at each bounday according to the degree. To continue the above example, for degree = 2 the design.knots = c(rep(0, 2), seq(0, 100, 10), rep(100, 2)).

If both the design.knots and other knot specification are provided, then the former are used by default. Default values for only the degree and nseg are provided, all the other P-spline inputs must be provided. Notice that the order and lambda penalty are not needed as these are encapsulated in the inference for the B-spline coefficients.

Poisson regression is typically used for estimating the B-spline coefficients, using maximum likelihood estimation (via iterative re-weighted least squares). A log-link function is usually used and as such the beta coefficients are on a log-scale, and the density needs to be exponentiated. However, an identity link may be (carefully) used and then these coefficients are on the usual scale.

The beta coefficients are estimated using a particular sample (size) and histogram bin-width, using Poisson regression. Thus to convert the predicted counts into a proper density it needs to be rescaled by dividing by n * binwidth. If nbinwidth=NULL is not provided then a crude approximate scaling is used by normalising the density to be proper. The renormalisation requires numerical integration, which is computationally intensive and so best avoided wherever possible.

Checks of the consistency of the xrange, degree and nseg and design.knots are made, with the values implied by the design.knots used by default to replace any incorrect values. These replacements are made for notational efficiency for users.

An inversion sampler is used for random number generation which also rather inefficient, as it could be carried out more efficiently using a mixture representation.

The quantile function is rather complicated as there is no closed form solution, so is obtained by numerical approximation of the inverse cumulative distribution function P(X \le q) = p to find q. The quantile function qpsden evaluates the P-splines cumulative distribution function over the xrange. A sequence of values of length fifty times the number of knots (with a minimum of 1000) is first calculated. Spline based interpolation using splinefun, with default monoH.FC method, is then used to approximate the quantile function. This is a similar approach to that taken by Matt Wand in the qkde in the ks package.

Value

dpsden gives the density, ppsden gives the cumulative distribution function, qpsden gives the quantile function and rpsden gives a random sample.

Note

Unlike most of the other extreme value mixture model functions the psden functions have not been vectorised as this is not appropriate. The main inputs (x, p or q) must be either a scalar or a vector, which also define the output length.

Default values are provided for P-spline inputs of degree and nseg only, but all others must be provided by the user. The default sample size for rpsden is 1.

Missing (NA) and Not-a-Number (NaN) values in x, p and q are passed through as is and infinite values are set to NA. None of these are not permitted for the parameters.

Error checking of the inputs (e.g. invalid probabilities) is carried out and will either stop or give warning message as appropriate.

Author(s)

Alfadino Akbar and Carl Scarrott carl.scarrott@canterbury.ac.nz.

References

http://en.wikipedia.org/wiki/B-spline

http://statweb.lsu.edu/faculty/marx/

Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science 11(2), 89-121.

See Also

splineDesign.

Other psden: fpsdengpd, fpsden, psdengpd

Other psdengpd: fpsdengpd, psdengpd

Other fpsden: fpsden

Examples

## Not run: 
set.seed(1)
par(mfrow = c(1, 1))

x = rnorm(1000)
xx = seq(-6, 6, 0.01)
y = dnorm(xx)

# Plenty of histogram bins (100)
breaks = seq(-4, 4, length.out=101)

# P-spline fitting with cubic B-splines, 2nd order penalty and 8 internal segments
# CV search for penalty coefficient. 
fit = fpsden(x, lambdaseq = 10^seq(-5, 5, 0.25), breaks = breaks,
             xrange = c(-4, 4), nseg = 10, degree = 3, ord = 2)
psdensity = exp(fit$bsplines %*% fit$mle)

hist(x, freq = FALSE, breaks = seq(-4, 4, length.out=101), xlim = c(-6, 6))
lines(xx, y, col = "black") # true density

# P-splines density from dpsden function
with(fit, lines(xx, dpsden(xx, beta, nbinwidth, design = design.knots), lwd = 2, col = "blue"))

legend("topright", c("True Density","P-spline density"), col=c("black", "blue"), lty = 1)

# plot B-splines
par(mfrow = c(2, 1))
with(fit, matplot(mids, as.matrix(bsplines), type = "l", lty = 1))

# Natural B-splines
knots = with(fit, seq(xrange[1], xrange[2], length.out = nseg + 1))
natural.knots = with(fit, c(rep(xrange[1], degree), knots, rep(xrange[2], degree)))
naturalb = splineDesign(natural.knots, fit$mids, ord = fit$degree + 1, outer.ok = TRUE)
with(fit, matplot(mids, naturalb, type = "l", lty = 1))

# Compare knot specifications
rbind(fit$design.knots, natural.knots)

# User can use natural B-splines if design.knots are specified manually
natural.fit = fpsden(x, lambdaseq = 10^seq(-5, 5, 0.25), breaks = breaks,
             design.knots = natural.knots, nseg = 10, degree = 3, ord = 2)
psdensity = with(natural.fit, exp(bsplines %*% mle))

par(mfrow = c(1, 1))
hist(x, freq = FALSE, breaks = seq(-4, 4, length.out=101), xlim = c(-6, 6))
lines(xx, y, col = "black") # true density

# check density against dpsden function
with(fit, lines(xx, dpsden(xx, beta, nbinwidth, design = design.knots), lwd = 2, col = "blue"))
with(natural.fit, lines(xx, dpsden(xx, beta, nbinwidth, design = design.knots),
                        lwd = 2, col = "red", lty = 2))

legend("topright", c("True Density", "Eilers and Marx B-splines", "Natural B-splines"),
   col=c("black", "blue", "red"), lty = c(1, 1, 2))

## End(Not run)


[Package evmix version 2.12 Index]