transfo {cellWise} | R Documentation |
Robustly fit the Box-Cox or Yeo-Johnson transformation
Description
This function uses reweighted maximum likelihood to robustly fit the
Box-Cox or Yeo-Johnson transformation to each variable in a dataset.
Note that this function first calls checkDataSet
to ensure that the variables to be transformed are not too discrete.
Usage
transfo(X, type = "YJ", robust = TRUE,
standardize = TRUE,
quant = 0.99, nbsteps = 2, checkPars = list())
Arguments
X |
A data matrix of dimensions n x d. Its columns are the variables to be transformed. |
type |
The type of transformation to be fit. Should be one of:
|
robust |
if |
standardize |
whether to standardize the variables before and after the power transformation. See Details below. |
quant |
quantile for determining the weights in the
reweighting step (ignored when |
nbsteps |
number of reweighting steps (ignored when
|
checkPars |
Optional list of parameters used in the call to
|
Details
In case standardize = TRUE
, the variables is standardized before and after transformation.
For BC the variable is divided by its median before transformation.
For YJ and robust = TRUE
this subtracts its median and divides by its mad (median absolute deviation) before transformation. For YJ and robust = FALSE
this subtracts the mean and divides by the standard deviation before transformation. For the standardization after the transformation, the classical mean and standard deviation are used in case robust = FALSE
. If robust = TRUE
, the mean and standard deviation are calculated robustly on a subset of inliers.
Value
A list with components:
lambdahats
the estimated transformation parameter for each column ofX
.Y
A matrix in which each column is the transformed version of the corresponding column ofX
. The transformed version includes pre- and post-standardization ifstandardize=TRUE
.muhat
The estimated location of each column ofY
.sigmahat
The estimated scale of each column ofY
.weights
The final weights from the reweighting.ttypes
The type of transform used in each column.objective
Value of the (reweighted) maximum likelihood objective function.values of
checkDataSet
, unlesscoreOnly
isTRUE
.
Author(s)
J. Raymaekers and P.J. Rousseeuw
References
J. Raymaekers and P.J. Rousseeuw (2021). Transforming variables to central normality. Machine Learning. doi:10.1007/s10994-021-05960-5(link to open access pdf)
See Also
transfo_newdata
, transfo_transformback
Examples
# find Box-Cox transformation parameter for lognormal data:
set.seed(123)
x <- exp(rnorm(1000))
transfo.out <- transfo(x, type = "BC")
# estimated parameter:
transfo.out$lambdahat
# value of the objective function:
transfo.out$objective
# the transformed variable:
transfo.out$Y
# the type of transformation used:
transfo.out$ttypes
# qqplot of the transformed variable:
qqnorm(transfo.out$Y); abline(0,1)
# For more examples, we refer to the vignette:
## Not run:
vignette("transfo_examples")
## End(Not run)