transfo {cellWise}R Documentation

Robustly fit the Box-Cox or Yeo-Johnson transformation


This function uses reweighted maximum likelihood to robustly fit the Box-Cox or Yeo-Johnson transformation to each variable in a dataset. Note that this function first calls checkDataSet to ensure that the variables to be transformed are not too discrete.


transfo(X, type = "YJ", robust = TRUE, lambdarange = NULL,
        prestandardize = TRUE, prescaleBC = F, scalefac = 1,
        quant = 0.99, nbsteps = 2, checkPars = list())



A data matrix of dimensions n x d. Its columns are the variables to be transformed.


The type of transformation to be fit. Should be one of:

  • "BC": Box-Cox power transformation. Only works for strictly positive variables. If this type is given but a variable is not strictly positive, the function stops with a message about that variable.

  • "YJ" Yeo-Johnson power transformation. The data may have positive as well as negative values.

  • "bestObj" for strictly positive variables both BC and YJ are run, and the solution with lowest objective is kept. On the other variables YJ is run.


if TRUE the Reweighted Maximum Likelihood method is used, which first computes a robust initial estimate of the transformation parameter lambda. If FALSE the classical ML method is used.


range of lambda values that will be optimized over. If NULL, the range goes from -4 to 6.


whether to standardize the variables before the power transformation.For BC the variable is divided by its median. For YJ and robust = TRUE this subtracts its median and divides by its mad (median absolute deviation). For YJ and robust = F this subtracts the mean and divides by the standard deviation.


for BC only. This standardizes the logarithm of the original variable by subtracting its median and dividing by its mad, after which the exponential function turns the result into a positive variable again.


when YJ is fit and prestandardize = TRUE, the standardized data is multiplied by scalefac. When BC is fit and prescaleBC = TRUE the same happens to the standardized log of the original variable.


quantile for determining the weights in the reweighting step (ignored when robust=FALSE).


number of reweighting steps (ignored when robust=FALSE).


Optional list of parameters used in the call to checkDataSet. The options are:

  • coreOnly
    If TRUE, skip the execution of checkDataset. Defaults to FALSE

  • numDiscrete
    A column that takes on numDiscrete or fewer values will be considered discrete and not retained in the cleaned data. Defaults to 5.

  • precScale
    Only consider columns whose scale is larger than precScale. Here scale is measured by the median absolute deviation. Defaults to 1e-12.

  • silent
    Whether or not the function progress messages should be printed. Defaults to FALSE.


A list with components:


J. Raymaekers and P.J. Rousseeuw


J. Raymaekers and P.J. Rousseeuw (2020). Transforming variables to central normality. Arxiv: 2005.07946. (link to open access pdf)


# find Box-Cox transformation parameter for lognormal data:
x <- exp(rnorm(1000))
transfo.out <- transfo(x, type = "BC")
# estimated parameter:
# value of the objective function:
# the transformed variable:
# the poststandardized transformed variable:
# the type of transformation used:
# qqplot of the poststandardized transformed variable:
qqnorm(transfo.out$Zt); abline(0,1)

# For more examples, we refer to the vignette:

[Package cellWise version 2.2.5 Index]