cadence.fit {CaDENCE} | R Documentation |
Fit a CDEN model
Description
Fit a CDEN model via nonlinear optimization of the maximum likelihood cost function.
Usage
cadence.fit(x, y, iter.max = 500, n.hidden = 2, hidden.fcn = tanh,
distribution = NULL, sd.norm = Inf, init.range = c(-0.5, 0.5),
method = c("optim", "psoptim", "Rprop"), n.trials = 1,
trace = 0, maxit.Nelder = 2000, trace.Nelder = 0,
swarm.size = NULL, vectorize = TRUE,
delta.0 = 0.1, delta.min = 1e-06, delta.max = 50, epsilon = 1e-08,
range.mult = 2, step.tol = 1e-08, f.target = -Inf,
f.cost = cadence.cost, max.exceptions = 500)
Arguments
x |
matrix with number of rows equal to the number of samples and number of columns equal to the number of predictor variables. |
y |
column matrix of predictand values with number of rows equal to the number of samples. |
iter.max |
maximum number of iterations of the optimization function. |
number of hidden nodes in the CDEN model; can be a vector indicating a range of values to fit. | |
hidden layer transfer function. | |
distribution |
a list that describes the probability density function associated with the predictand. |
sd.norm |
|
init.range |
range for random weights on [ |
method |
specifies the optimization method used to minimize |
n.trials |
number of repeated trials used to avoid shallow local minima during optimization. |
trace |
the level of printing which is done during optimization. A value of |
maxit.Nelder |
maximum number of iterations of the Nelder-Mead optimization function prior to main calling |
trace.Nelder |
the level of printing which is done during Nelder-Mead optimization. A value of |
swarm.size |
|
vectorize |
|
delta.0 |
size of the initial update-value if |
delta.min |
minimum value for the adaptive update-value if |
delta.max |
maximum value for the adaptive update-value if |
epsilon |
step-size used in the finite difference calculation of the gradient if |
range.mult |
if |
step.tol |
convergence criterion if |
f.target |
target value of |
f.cost |
cost function to be optimized. |
max.exceptions |
maximum number of repeated exceptions allowed during optimization. |
Details
Fit a CDEN model by optimizing the maximum likelihood cost function
f.cost
, which is set by default to cadence.cost
.
Optimization relies on the standard optim
function, the
built-in rprop
function, or, optionally,
the psoptim
function from the pso
package.
The hidden layer transfer function hidden.fcn
should be set to
tanh
for a nonlinear model and to identity
for a
linear model. In the nonlinear case, the number of hidden nodes n.hidden
controls the overall complexity of the model. The predictand distribution
is set by the distribution
argument. Parameters of the specified
distribution can be held constant via the parameters.fixed
element
distribution
. Weight penalty regularization for the magnitude of the
input-hidden layer weights can be applied by setting sd.norm
to a value
less than Inf
.
The distribution
argument in cadence.fit
is the most important
part of the CaDENCE
modelling framework and has been designed to be
as flexible as possible. To this end, distribution
is a list with three
mandatory elements: density.fcn
, which specifies the R density function
for the predictand distribution; parameters
, which specifies the names
of the parameters used as arguments in density.fcn
; and
output.fcns
, which specifies the functions used to constrain the density
function parameters to their allowable ranges (i.e., inverse link
functions). If not specified, distribution
defaults to a normal
distribution. Note: the order of parameters
and output.fcns
must
match the order of arguments in the specified density.fcn
.
A fourth element of distribution
, parameters.fixed
, is optional.
Setting parameters.fixed
="sd"
for the normal distribution would, for
example, force the sd
parameter to take a constant value.
Samples of distribution
lists for a variety of probability distributions
are given below for reference:
# normal distribution norm.distribution <- list(density.fcn = dnorm, parameters = c("mean", "sd"), parameters.fixed = NULL, output.fcns = c(identity, exp)) # lognormal distribution lnorm.distribution <- list(density.fcn = dlnorm, parameters = c("meanlog", "sdlog"), parameters.fixed = NULL, output.fcns = c(identity, exp)) # exponential distribution exp.distribution <- list(density.fcn = dexp, parameters = c("rate"), parameters.fixed = NULL, output.fcns = c(exp)) # Poisson distribution poisson.distribution <- list(density.fcn = dpois, parameters = c("lambda"), parameters.fixed = NULL, output.fcns = c(exp)) # Bernoulli-gamma distribution bgamma.distribution <- list(density.fcn = dbgamma, parameters = c("prob", "scale", "shape"), parameters.fixed = NULL, output.fcns = c(logistic, exp, exp)) # Bernoulli-Weibull distribution bweibull.distribution <- list(density.fcn = dbweibull, parameters = c("prob", "scale", "shape"), parameters.fixed = NULL, output.fcns = c(logistic, exp, exp)) # Bernoulli-lognormal distribution blnorm.distribution <- list(density.fcn = dblnorm, parameters = c("prob", "meanlog", "sdlog"), parameters.fixed = NULL, output.fcns = c(logistic, identity, exp)) # Bernoulli-Pareto 2 distribution bpareto2.distribution <- list(density.fcn = dbpareto2, parameters = c("prob", "scale", "shape"), parameters.fixed = NULL, output.fcns = c(logistic, exp, exp)) # beta distribution beta.distribution <- list(density.fcn=dbeta, parameters=c("shape1", "shape2"), parameters.fixed=NULL, output.fcns=c(exp, exp)) # truncated normal distribution with lower = 0 library(msm) dtnormal <- function(x, mean, sd) dtnorm(x, mean, sd, lower = 0) dtnorm.distribution <- list(density.fcn = dtnormal, parameters = c("mean", "sd"), parameters.fixed = NULL, output.fcns = c(identity, exp)) # mixture of two normal distributions (mixture density network) library(nor1mix) dnormix <- function(x, mu1, mu2, sig1, sig2, w1){ if(length(x) > 1){ dens <- mapply(dnormix, x, mu1 = mu1, mu2 = mu2, sig1 = sig1, sig2 = sig2, w1 = w1) } else{ mix <- norMix(mu = c(mu1, mu2), sigma = c(sig1, sig2), w = c(w1, 1-w1)) dens <- dnorMix(x, mix) } dens } normix.distribution <- list(density.fcn = dnormix, parameters = c("mu1", "mu2", "sig1", "sig2", "w1"), parameters.fixed = NULL, output.fcns = c(identity, identity, exp, exp, logistic))
Values of the Akaike information criterion with small sample size correction
(AICc), and Bayesian information criterion (BIC) are calculated to assist in
model selection. It is possible for such criteria to fail in the face of
overfitting, for example with a nonlinear model and n.hidden
set too
high, as the distribution may converge on one or more samples. This can usually
be diagnosed by inspecting the scale parameter of the distribution for near
zero values. In this case, one can apply a weight penalty (via sd.norm
),
although this rules out the straightforward use of AICc/BIC for model
selection as the effective number of model parameters will no longer equal the
number of weights in the CDEN model.
Note: values of x
need not be standardized or rescaled by the user.
Predictors are automatically scaled to zero mean and unit standard deviation
and are rescaled by cadence.predict
.
Value
a list of with number of elements equal to the length of n.hidden
; each list consists of:
W1 |
input-hidden layer weights |
W2 |
hidden-output layer weights. Attributes indicating the
mean and standard deviation of columns of |
References
Cannon, A.J., 2012. Neural networks for probabilistic environmental prediction: Conditional Density Estimation Network Creation & Evaluation (CaDENCE) in R. Computers & Geosciences 41: 126-135. doi:10.1016/j.cageo.2011.08.023
Neuneier, R., F. Hergert, W. Finnoff, and D. Ormoneit, 1994., Estimation of conditional densities: a comparison of neural network approaches. In: M. Marinaro and P. Morasso (eds.), Proceedings of ICANN 94, Berlin, Springer, p. 689-692.
See Also
cadence.predict
, optim
, rprop
,
xval.buffer
, logistic
Examples
data(FraserSediment)
set.seed(1)
lnorm.distribution <- list(density.fcn = dlnorm,
parameters = c("meanlog", "sdlog"),
parameters.fixed = NULL,
output.fcns = c(identity, exp))
fit <- cadence.fit(x = FraserSediment$x.1970.1976[c(TRUE, rep(FALSE, 19)),],
y = FraserSediment$y.1970.1976[c(TRUE, rep(FALSE, 19)),,
drop=FALSE], n.hidden = 3, n.trials = 1,
maxit.Nelder = 100, trace.Nelder = 1, hidden.fcn = tanh,
distribution = lnorm.distribution, trace = 1)
pred <- cadence.predict(x = FraserSediment$x.1977.1979, fit = fit)
matplot(pred, type = "l")