step_orderNorm {bestNormalize}R Documentation

ORQ normalization (orderNorm) for recipes implementation

Description

'step_orderNorm' creates a specification of a recipe step (see 'recipes' package) that will transform data using the ORQ (orderNorm) transformation, which approximates the "true" normalizing transformation if one exists. This is considerably faster than 'step_bestNormalize'.

Usage

step_orderNorm(
  recipe,
  ...,
  role = NA,
  trained = FALSE,
  transform_info = NULL,
  transform_options = list(),
  num_unique = 5,
  skip = FALSE,
  id = rand_id("orderNorm")
)

## S3 method for class 'step_orderNorm'
tidy(x, ...)

## S3 method for class 'step_orderNorm'
axe_env(x, ...)

Arguments

recipe

A formula or recipe

...

One or more selector functions to choose which variables are affected by the step. See [selections()] for more details. For the 'tidy' method, these are not currently used.

role

Not used by this step since no new variables are created.

trained

For recipes functionality

transform_info

A numeric vector of transformation values. This (was transform_info) is 'NULL' until computed by [prep.recipe()].

transform_options

options to be passed to orderNorm

num_unique

An integer where data that have less possible values will not be evaluate for a transformation.

skip

For recipes functionality

id

For recipes functionality

x

A 'step_orderNorm' object.

Details

The orderNorm transformation can be used to rescale a variable to be more similar to a normal distribution. See '?orderNorm' for more information; 'step_orderNorm' is the implementation of 'orderNorm' in the 'recipes' context.

As of version 1.7, the 'butcher' package can be used to (hopefully) improve scalability of this function on bigger data sets.

Value

An updated version of 'recipe' with the new step added to the sequence of existing steps (if any). For the 'tidy' method, a tibble with columns 'terms' (the selectors or variables selected) and 'value' (the lambda estimate).

References

Ryan A. Peterson (2019). Ordered quantile normalization: a semiparametric transformation built for the cross-validation era. Journal of Applied Statistics, 1-16.

See Also

orderNorm bestNormalize, [recipe()] [prep.recipe()] [bake.recipe()]

Examples

library(recipes)
rec <- recipe(~ ., data = as.data.frame(iris))

orq_trans <- step_orderNorm(rec, all_numeric())

orq_estimates <- prep(orq_trans, training = as.data.frame(iris))

orq_data <- bake(orq_estimates, as.data.frame(iris))

plot(density(iris[, "Petal.Length"]), main = "before")
plot(density(orq_data$Petal.Length), main = "after")

tidy(orq_trans, number = 1)
tidy(orq_estimates, number = 1)



[Package bestNormalize version 1.9.1 Index]