R: Gaussianize matrix-like objects

Gaussianize {LambertW}

R Documentation

Gaussianize matrix-like objects

Description

Gaussianize is probably the most useful function in this package. It works the same way as scale, but instead of just centering and scaling the data, it actually Gaussianizes the data (works well for unimodal data). See Goerg (2011, 2016) and Examples.

Important: For multivariate input X it performs a column-wise Gaussianization (by simply calling apply(X, 2, Gaussianize)), which is only a marginal Gaussianization. This does not mean (and is in general definitely not the case) that the transformed data is then jointly Gaussian.

By default Gaussianize returns the X \sim N(\mu_x, \sigma_x^2) input, not the zero-mean, unit-variance U \sim N(0, 1) input. Use return.u = TRUE to obtain U.

Usage

Gaussianize(
  data = NULL,
  type = c("h", "hh", "s"),
  method = c("IGMM", "MLE"),
  return.tau.mat = FALSE,
  inverse = FALSE,
  tau.mat = NULL,
  verbose = FALSE,
  return.u = FALSE,
  input.u = NULL
)

Arguments

`data`	a numeric matrix-like object; either the data that should be Gaussianized; or the data that should ”DeGaussianized” (`inverse = TRUE`), i.e., converted back to the original space.
`type`	what type of non-normality: symmetric heavy-tails `"h"` (default), skewed heavy-tails `"hh"`, or just skewed `"s"`.
`method`	what estimator should be used: `"MLE"` or `"IGMM"`. `"IGMM"` gives exactly Gaussian characteristics (kurtosis `\equiv` 3 for `"h"` or skewness `\equiv` 0 for `"s"`), `"MLE"` comes close to this. Default: `"IGMM"` since it is much faster than `"MLE"`.
`return.tau.mat`	logical; if `TRUE` it also returns the estimated `\tau` parameters as a matrix (same number of columns as `data`). This matrix can then be used to `Gaussianize` new data with pre-estimated `\tau`. It can also be used to “DeGaussianize” data by passing it as an argument (`tau.mat`) to `Gaussianize()` and set `inverse = TRUE`.
`inverse`	logical; if `TRUE` it performs the inverse transformation using `tau.mat` to "DeGaussianize" the data back to the original space again.
`tau.mat`	instead of estimating `\tau` from the data you can pass it as a matrix (usually obtained via `Gaussianize(..., return.tau.mat = TRUE)`). If `inverse = TRUE` it uses this `tau` matrix to “DeGaussianize” the data again. This is useful to back-transform new data in the Gaussianized space, e.g., predictions or fits, back to the original space.
`verbose`	logical; if `TRUE`, it prints out progress information in the console. Default: `FALSE`.
`return.u`	logical; if `TRUE` it returns the zero-mean, unit variance Gaussian input. If `FALSE` (default) it returns the input `X`.
`input.u`	optional; if you used `return.u = TRUE` in a previous step, and now you want to convert the data back to original space, then you have to pass it as `input.u`. If you pass numeric data as `data`, `Gaussianize` assumes that `data` is the input corresponding to `X`, not `U`.

Value

numeric matrix-like object with same dimension/size as input data. If inverse = FALSE it is the Gaussianize matrix / vector; if TRUE it is the “DeGaussianized” matrix / vector.

The numeric parameters of mean, scale, and skewness/heavy-tail parameters that were used in the Gaussianizing transformation are returned as attributes of the output matrix: 'Gaussianized:mu', 'Gaussianized:sigma', and for

`type = "h":`	`'Gaussianized:delta'` & `'Gaussianized:alpha'`,
`type = "hh":`	`'Gaussianized:delta_l'` and `'Gaussianized:delta_r'` & `'Gaussianized:alpha_l'` and `'Gaussianized:alpha_r'`,
`type = "s":`	`'Gaussianized:gamma'`.

They can also be returned as a separate matrix using return.tau.mat = TRUE. In this case Gaussianize returns a list with elements:

`input`	Gaussianized input data `\boldsymbol x` (or `\boldsymbol u` if `return.u = TRUE`),
`tau.mat`	matrix with `\tau` estimates that we used to get `x`; has same number of columns as `x`, and 3, 5, or 6 rows (depending on `type='s'`, `'h'`, or `'hh'`).

Examples


# Univariate example
set.seed(20)
y1 <- rcauchy(n = 100)
out <- Gaussianize(y1, return.tau.mat = TRUE)
x1 <- get_input(y1, c(out$tau.mat[, 1]))  # same as out$input
test_normality(out$input) # Gaussianized a Cauchy!

kStartFrom <- 20
y.cum.avg <- (cumsum(y1)/seq_along(y1))[-seq_len(kStartFrom)]
x.cum.avg <- (cumsum(x1)/seq_along(x1))[-seq_len(kStartFrom)]

plot(c((kStartFrom + 1): length(y1)), y.cum.avg, type="l" , lwd = 2, 
     main="CLT in practice", xlab = "n", 
     ylab="Cumulative sample average", 
     ylim = range(y.cum.avg, x.cum.avg))
lines(c((kStartFrom+1): length(y1)), x.cum.avg, col=2, lwd=2)
abline(h = 0)
grid()
legend("bottomright", c("Cauchy", "Gaussianize"), col = c(1, 2), 
       box.lty = 0, lwd = 2, lty = 1)

plot(x1, y1, xlab="Gaussian-like input", ylab = "Cauchy - output")
grid()
## Not run: 
# multivariate example
y2 <- 0.5 * y1 + rnorm(length(y1))
YY <- cbind(y1, y2)
plot(YY)

XX <- Gaussianize(YY, type = "hh")
plot(XX)

out <- Gaussianize(YY, type = "h", return.tau.mat = TRUE, 
                   verbose = TRUE, method = "IGMM")
                   
plot(out$input)
out$tau.mat

YY.hat <- Gaussianize(data = out$input, tau.mat = out$tau.mat,
                      inverse = TRUE)
plot(YY.hat[, 1], YY[, 1])

## End(Not run)

[Package LambertW version 0.6.9-1 Index]