Deriv {Deriv} | R Documentation |
Symbolic differentiation of an expression or function
Description
Symbolic differentiation of an expression or function
Usage
Deriv(
f,
x = if (is.function(f)) NULL else all.vars(if (is.character(f)) parse(text = f) else
f),
env = if (is.function(f)) environment(f) else parent.frame(),
use.D = FALSE,
cache.exp = TRUE,
nderiv = NULL,
combine = "c",
drule = Deriv::drule
)
Arguments
f |
An expression or function to be differentiated. f can be
|
x |
An optional character vector with variable name(s) with respect to which
|
env |
An environment where the symbols and functions are searched for.
Defaults to |
use.D |
An optional logical (default FALSE), indicates if base::D() must be used for differentiation of basic expressions. |
cache.exp |
An optional logical (default TRUE), indicates if final expression must be optimized with cached sub-expressions. If enabled, repeated calculations are made only once and their results stored in cache variables which are then reused. |
nderiv |
An optional integer vector of derivative orders to calculate. Default NULL value correspond to one differentiation. If length(nderiv)>1, the resulting expression is a list where each component corresponds to derivative order given in nderiv. Value 0 corresponds to the original function or expression non differentiated. All values must be non negative. If the entries in nderiv are named, their names are used as names in the returned list. Otherwise the value of nderiv component is used as a name in the resulting list. |
combine |
An optional character scalar, it names a function to combine
partial derivatives. Default value is "c" but other functions can be
used, e.g. "cbind" (cf. Details, NB3), "list" or user defined ones. It must
accept any number of arguments or at least the same number of arguments as
there are items in |
drule |
An optional environment-like containing derivative rules (cf. Details for syntax rules). |
Details
R already contains two differentiation functions: D and deriv. D does simple univariate differentiation. "deriv" uses D to do multivariate differentiation. The output of "D" is an expression, whereas the output of "deriv" can be an executable function.
R's existing functions have several limitations. They can probably be fixed, but since they are written in C, this would probably require a lot of work. Limitations include:
The derivatives table can't be modified at runtime, and is only available in C.
Function cannot substitute function calls. eg: f <- function(x, y) x + y; deriv(~f(x, x^2), "x")
So, here are the advantages of this implementation:
It is entirely written in R, so would be easier to maintain.
Can do multi-variate differentiation.
Can differentiate function calls:
if the function is in the derivative table, then the chain rule is applied. For example, if you declared that the derivative of sin is cos, then it would figure out how to call cos correctly.
if the function is not in the derivative table (or it is anonymous), then the function body is substituted in.
these two methods can be mixed. An entry in the derivative table need not be self-contained – you don't need to provide an infinite chain of derivatives.
It's easy to add custom entries to the derivatives table, e.g.
drule[["cos"]] <- alist(x=-sin(x))
The chain rule will be automatically applied if needed.
The output is an executable function, which makes it suitable for use in optimization problems.
Compound functions (i.e. piece-wise functions based on if-else operator) can be differentiated (cf. examples section).
in case of multiple derivatives (e.g. gradient and hessian calculation), caching can make calculation economies for both
Starting from v4.0, some matrix calculus operations are possible (contribution of Andreas Rappold). See an example hereafter for differentiation of the inverse of 2x2 matrix and whose elements depend on variable of differentiation
x
.
Two environments drule
and simplifications
are
exported in the package's NAMESPACE.
As their names indicate, they contain tables of derivative and
simplification rules.
To see the list of defined rules do ls(drule)
.
To add your own derivative rule for a function called say sinpi(x)
calculating sin(pi*x), do drule[["sinpi"]] <- alist(x=pi*cospi(x))
.
Here, "x" stands for the first and unique argument in sinpi()
definition. For a function that might have more than one argument,
e.g. log(x, base=exp(1))
, the drule entry must be a list with a named rule
per argument. See drule$log
for an example to follow.
After adding sinpi
you can differentiate expressions like
Deriv(~ sinpi(x^2), "x")
. The chain rule will automatically apply.
Starting from v4.0, user can benefit from a syntax .d_X
in the rule writing. Here X
must be replace by an argument name (cf. drule[["solve"]]
for an example). A use of this syntax leads to a replacement of this place-holder by a derivative of the function (chain rule is automatically integrated) by the named argument.
Another v4.0 novelty in rule's syntax is a possible use of optional parameter `_missing`
which can be set to TRUE or FALSE (default) to indicate how to treat missing arguments. By default, i.e. in absence of this parameter or set to FALSE, missing arguments were replaced by their default values. Now, if `_missing`=TRUE
is specified in a rule, the missing arguments will be left missed in the derivative. Look drule[["solve"]]
for an example.
NB. In abs()
and sign()
function, singularity treatment
at point 0 is left to user's care.
For example, if you need NA at singular points, you can define the following:
drule[["abs"]] <- alist(x=ifelse(x==0, NA, sign(x)))
drule[["sign"]] <- alist(x=ifelse(x==0, NA, 0))
NB2. In Bessel functions, derivatives are calculated only by the first argument,
not by the nu
argument which is supposed to be constant.
NB3. There is a side effect with vector length. E.g. in
Deriv(~a+b*x, c("a", "b"))
the result is c(a = 1, b = x)
.
To avoid the difference in lengths of a and b components (when x is a vector),
one can use an optional parameter combine
Deriv(~a+b*x, c("a", "b"), combine="cbind")
which gives
cbind(a = 1, b = x)
producing a two column matrix which is
probably the desired result here.
Another example illustrating a side effect is a plain linear
regression case and its Hessian:
Deriv(~sum((a+b*x - y)**2), c("a", "b"), n=c(hessian=2)
producing just a constant 2
for double differentiation by a
instead of expected result 2*length(x)
. It comes from a simplification of
an expression sum(2)
where the constant is not repeated as many times
as length(x) would require it. Here, using the same trick
with combine="cbind"
would not help as all 4 derivatives are just scalars.
Instead, one should modify the previous call to explicitly use a constant vector
of appropriate length:
Deriv(~sum((rep(a, length(x))+b*x - y)**2), c("a", "b"), n=2)
NB4. Differentiation of *apply()
family (available starting from v4.1) is
done only on the body of the FUN
argument. It implies that this
body must use the same variable names as in x
and they must not
appear in FUN
s arguments (cf. GMM example).
Value
a function if
f
is a functionan expression if
f
is an expressiona character string if
f
is a character stringa language (usually a so called 'call' but may be also a symbol or just a numeric) for other types of
f
Author(s)
Andrew Clausen (original version) and Serguei Sokol (actual version and maintainer)
Examples
## Not run: f <- function(x) x^2
## Not run: Deriv(f)
# function (x)
# 2 * x
## Not run: f <- function(x, y) sin(x) * cos(y)
## Not run: Deriv(f)
# function (x, y)
# c(x = cos(x) * cos(y), y = -(sin(x) * sin(y)))
## Not run: f_ <- Deriv(f)
## Not run: f_(3, 4)
# x y
# [1,] 0.6471023 0.1068000
## Not run: Deriv(~ f(x, y^2), "y")
# -(2 * (y * sin(x) * sin(y^2)))
## Not run: Deriv(quote(f(x, y^2)), c("x", "y"), cache.exp=FALSE)
# c(x = cos(x) * cos(y^2), y = -(2 * (y * sin(x) * sin(y^2))))
## Not run: Deriv(expression(sin(x^2) * y), "x")
# expression(2*(x*y*cos(x^2)))
Deriv("sin(x^2) * y", "x") # differentiate only by x
"2 * (x * y * cos(x^2))"
Deriv("sin(x^2) * y", cache.exp=FALSE) # differentiate by all variables (here by x and y)
"c(x = 2 * (x * y * cos(x^2)), y = sin(x^2))"
# Compound function example (here abs(x) smoothed near 0)
fc <- function(x, h=0.1) if (abs(x) < h) 0.5*h*(x/h)**2 else abs(x)-0.5*h
Deriv("fc(x)", "x", cache.exp=FALSE)
"if (abs(x) < h) x/h else sign(x)"
# Example of a first argument that cannot be evaluated in the current environment:
## Not run:
suppressWarnings(rm("xx", "yy"))
Deriv(xx^2+yy^2)
## End(Not run)
# c(xx = 2 * xx, yy = 2 * yy)
# Automatic differentiation (AD), note intermediate variable 'd' assignment
## Not run: Deriv(~{d <- ((x-m)/s)^2; exp(-0.5*d)}, "x", cache.exp=FALSE)
#{
# d <- ((x - m)/s)^2
# .d_x <- 2 * ((x - m)/s^2)
# -(0.5 * (.d_x * exp(-(0.5 * d))))
#}
# Custom differentiation rule
## Not run:
myfun <- function(x, y=TRUE) NULL # do something useful
dmyfun <- function(x, y=TRUE) NULL # myfun derivative by x.
drule[["myfun"]] <- alist(x=dmyfun(x, y), y=NULL) # y is just a logical => no derivate
Deriv(~myfun(z^2, FALSE), "z", drule=drule)
# 2 * (z * dmyfun(z^2, FALSE))
## End(Not run)
# Differentiation by list components
## Not run:
theta <- list(m=0.1, sd=2.)
x <- names(theta)
names(x)=rep("theta", length(theta))
Deriv(~exp(-(x-theta$m)**2/(2*theta$sd)), x, cache.exp=FALSE)
# c(theta_m = exp(-((x - theta$m)^2/(2 * theta$sd))) *
# (x - theta$m)/theta$sd, theta_sd = 2 * (exp(-((x - theta$m)^2/
# (2 * theta$sd))) * (x - theta$m)^2/(2 * theta$sd)^2))
## End(Not run)
# Differentiation in matrix calculus
## Not run:
Deriv(~solve(matrix(c(1, x, x**2, x**3), nrow=2, ncol=2)))
## End(Not run)
# Two component Gaussian mixture model (GMM) example
## Not run:
# define GMM probability density function -> p(x, ...)
ncomp=2
a=runif(ncomp)
a=a/sum(a) # amplitude or weight of each component
m=rnorm(ncomp) # mean
s=runif(ncomp) # sd
# two column matrix of probabilities: one row per x value, one column per component
pn=function(x, a, m, s, log=FALSE) {
n=length(a)
structure(vapply(seq(n), function(i) a[i]*dnorm(x, m[i], s[i], log),
double(length(x))), dim=c(length(x), n))
}
p=function(x, a, m, s) rowSums(pn(x, a, m, s)) # overall probability
dp=Deriv(p, "x")
# plot density and its derivative
xp=seq(min(m-2*s), max(m+2*s), length.out=200)
matplot(xp, cbind(p(xp, a, m, s), dp(xp, a, m, s)),
xlab="x", ylab="p, dp/dx", type="l", main="Two component GMM")
## End(Not run)