LogSt {DescTools}R Documentation

Started Logarithmic Transformation and Its Inverse

Description

Transforms the data by a log transformation, modifying small and zero observations such that the transformation is linear for x<=thresholdx <= threshold and logarithmic for x > threshold. So the transformation yields finite values and is continuously differentiable.

Usage

LogSt(x, base = 10, calib = x, threshold = NULL, mult = 1)

LogStInv(x, base = NULL, threshold = NULL)

Arguments

x

a vector or matrix of data, which is to be transformed

base

a positive or complex number: the base with respect to which logarithms are computed. Defaults to 10. Use=exp(1) for natural log.

calib

a vector or matrix of data used to calibrate the transformation(s), i.e., to determine the constant cc needed

threshold

constant cc that determines the transformation. The inverse function LogStInv will look for an attribute named "threshold" if the argument is set to NULL.

mult

a tuning constant affecting the transformation of small values, see Details.

Details

In order to avoid log(x)=log(x) = -\infty for x=0x=0 in log-transformations there's often a constant added to the variable before taking the loglog. This is not always a pleasable strategy. The function LogSt handles this problem based on the following ideas:

These criteria are implemented here as follows: The shape is determined by a threshold cc at which - coming from above - the log function switches to a linear function with the same slope at this point.

This is obtained by

g(x)={log10(x)for xclog10(c)cxclog(10)for x<cg(x) = \left\{\begin{array}{ll} log_{10}(x) &\textup{for }x \ge c\\ log_{10}(c) - \frac{c - x}{c \cdot log(10)} &\textup{for } x < c \end{array}\right.

Small values are determined by the threshold cc. If not given by the argument threshold, it is determined by the quartiles q1q_1 and q3q_3 of the non-zero data as those smaller than c=q11+rq3rc = \frac{q_1^{1+r}}{q_3^r} where rr can be set by the argument mult. The rationale is, that, for lognormal data, this constant identifies 2 percent of the data as small.
Beyond this limit, the transformation continues linear with the derivative of the log curve at this point.

Another idea for choosing the threshold cc was: median(x) / (median(x)/quantile(x, 0.25))^2.9)

The function chooses log10log_{10} rather than natural logs by default because they can be backtransformed relatively easily in mind.

A generalized log (see: Rocke 2003) can be calculated in order to stabilize the variance as:

function (x, a) {
 return(log((x + sqrt(x^2 + a^2)) / 2))
}

Value

the transformed data. The value cc used for the transformation and needed for inverse transformation is returned as attr(.,"threshold") and the used base as attr(.,"base").

Author(s)

Werner A. Stahel, ETH Zurich
slight modifications Andri Signorell <andri@signorell.net>

References

Rocke, D M, Durbin B (2003): Approximate variance-stabilizing transformations for gene-expression microarray data, Bioinformatics. 22;19(8):966-72.

See Also

log, log10

Examples

dd <- c(seq(0,1,0.1), 5 * 10^rnorm(100, 0, 0.2))
dd <- sort(dd)
r.dl <- LogSt(dd)
plot(dd, r.dl, type="l")
abline(v=attr(r.dl, "threshold"), lty=2)

x <- rchisq(df=3, n=100)
# should give 0 (or at least something small):
LogStInv(LogSt(x)) - x

[Package DescTools version 0.99.55 Index]