clsd {crs} | R Documentation |
Categorical Logspline Density
Description
clsd
computes the logspline density, density
derivative, distribution, and smoothed quantiles for a one (1)
dimensional continuous variable using the approach of Racine
(2013).
Usage
clsd(x = NULL,
beta = NULL,
xeval = NULL,
degree = NULL,
segments = NULL,
degree.min = 2,
degree.max = 25,
segments.min = 1,
segments.max = 100,
lbound = NULL,
ubound = NULL,
basis = "tensor",
knots = "quantiles",
penalty = NULL,
deriv.index = 1,
deriv = 1,
elastic.max = TRUE,
elastic.diff = 3,
do.gradient = TRUE,
er = NULL,
monotone = TRUE,
monotone.lb = -250,
n.integrate = 500,
nmulti = 1,
method = c("L-BFGS-B", "Nelder-Mead", "BFGS", "CG", "SANN"),
verbose = FALSE,
quantile.seq = seq(.01,.99,by=.01),
random.seed = 42,
maxit = 10^5,
max.attempts = 25,
NOMAD = FALSE)
Arguments
x |
a numeric vector of training data |
beta |
a numeric vector of coefficients (default |
xeval |
a numeric vector of evaluation data |
degree |
integer/vector specifying the polynomial degree of the
B-spline basis for each dimension of the continuous |
segments |
integer/vector specifying the number of segments of the
B-spline basis for each dimension of the continuous |
segments.min , segments.max |
when |
degree.min , degree.max |
when |
lbound , ubound |
lower/upper bound for the support of the density. For example, if there is a priori knowledge that the density equals zero to the left of 0, and has a discontinuity at 0, the user could specify lbound = 0. However, if the density is essentially zero near 0, one does not need to specify lbound |
basis |
a character string (default |
knots |
a character string (default |
deriv |
an integer |
deriv.index |
an integer |
nmulti |
integer number of times to restart the process of finding extrema of
the cross-validation function from different (random) initial
points (default |
penalty |
the parameter to be used in the AIC criterion. The
method chooses the number of degrees plus number of segments
(knots-1) that maximizes |
elastic.max , elastic.diff |
a logical value/integer indicating
whether to use ‘elastic’ search bounds such that the optimal
degree/segment must lie |
do.gradient |
a logical value indicating whether or not to use
the analytical gradient during optimization (defaults to |
er |
a scalar indicating the fraction of data range to extend
the tails (default |
monotone |
a logical value indicating whether modify
the standard B-spline basis function so that it is tailored for
density estimation (default |
monotone.lb |
a negative bound specifying the lower bound on
the linear segment coefficients used when ( |
n.integrate |
the number of evenly spaced integration points on the extended range specified by |
method |
see |
verbose |
a logical value which when |
quantile.seq |
a sequence of numbers lying in |
random.seed |
seeds the random number generator for initial
parameter values when |
maxit |
maximum number of iterations used by |
max.attempts |
maximum number of attempts to undertake if |
NOMAD |
a logical value which when |
Details
Typical usages are (see below for a list of options and also the examples at the end of this help file)
model <- clsd(x)
clsd
computes a logspline density estimate of a one (1)
dimensional continuous variable.
The spline model employs the tensor product B-spline basis matrix for
a multivariate polynomial spline via the B-spline routines in the GNU
Scientific Library (https://www.gnu.org/software/gsl/) and the
tensor.prod.model.matrix
function.
When basis="additive"
the model becomes additive in nature
(i.e. no interaction/tensor terms thus semiparametric not fully
nonparametric).
When basis="tensor"
the model uses the multivariate tensor
product basis.
Value
clsd
returns a clsd
object. The generic functions
coef
, fitted
, plot
and
summary
support objects of this type (er=FALSE
plots the density on the sample realizations (default is ‘extended
range’ data), see er
above, distribution=TRUE
plots
the distribution). The returned object has the following components:
density |
estimates of the density function at the sample points |
density.er |
the density evaluated on the ‘extended range’ of the data |
density.deriv |
estimates of the derivative of the density function at the sample points |
density.deriv.er |
estimates of the derivative of the density function evaluated on the ‘extended range’ of the data |
distribution |
estimates of the distribution function at the sample points |
distribution.er |
the distribution evaluated on the ‘extended range’ of the data |
xer |
the ‘extended range’ of the data |
degree |
integer/vector specifying the degree of the B-spline
basis for each dimension of the continuous |
segments |
integer/vector specifying the number of segments of
the B-spline basis for each dimension of the continuous |
xq |
vector of quantiles |
tau |
vector generated by |
Usage Issues
This function should be considered to be in ‘beta’ status until further notice.
If smoother estimates are desired and degree=degree.min
, increase
degree.min
to, say, degree.min=3
.
The use of ‘regression’ B-splines can lead to undesirable behavior at
the endpoints of the data (i.e. when monotone=FALSE
). The
default ‘density’ B-splines ought to be well-behaved in these regions.
Author(s)
Jeffrey S. Racine racinej@mcmaster.ca
References
Racine, J.S. (2013), “Logspline Mixed Data Density Estimation,” manuscript.
See Also
Examples
## Not run:
## Old Faithful eruptions data histogram and clsd density
library(MASS)
data(faithful)
attach(faithful)
model <- clsd(eruptions)
ylim <- c(0,max(model$density,hist(eruptions,breaks=20,plot=FALSE)$density))
plot(model,ylim=ylim)
hist(eruptions,breaks=20,freq=FALSE,add=TRUE,lty=2)
rug(eruptions)
summary(model)
coef(model)
## Simulated data
set.seed(42)
require(logspline)
## Example - simulated data
n <- 250
x <- sort(rnorm(n))
f.dgp <- dnorm(x)
model <- clsd(x)
## Standard (cubic) estimate taken from the logspline package
## Compute MSEs
mse.clsd <- mean((fitted(model)-f.dgp)^2)
model.logspline <- logspline(x)
mse.logspline <- mean((dlogspline(x,model.logspline)-f.dgp)^2)
ylim <- c(0,max(fitted(model),dlogspline(x,model.logspline),f.dgp))
plot(model,
ylim=ylim,
sub=paste("MSE: logspline = ",format(mse.logspline),", clsd = ",
format(mse.clsd)),
lty=3,
col=3)
xer <- model$xer
lines(xer,dlogspline(xer,model.logspline),col=2,lty=2)
lines(xer,dnorm(xer),col=1,lty=1)
rug(x)
legend("topright",c("DGP",
paste("Cubic Logspline Density (package 'logspline', knots = ",
model.logspline$nknots,")",sep=""),
paste("clsd Density (degree = ", model$degree, ", segments = ",
model$segments,", penalty = ",round(model$penalty,2),")",sep="")),
lty=1:3,
col=1:3,
bty="n",
cex=0.75)
summary(model)
coef(model)
## Simulate data with known bounds
set.seed(42)
n <- 10000
x <- runif(n,0,1)
model <- clsd(x,lbound=0,ubound=1)
plot(model)
## End(Not run)