transform_vector_fp {mfp2} | R Documentation |
Functions to transform a variable using fractional polynomial powers or acd
Description
These functions generate fractional polynomials for a variable similar to
fracgen
in Stata. transform_vector_acd
generates the acd transformation
for a variable.
Usage
transform_vector_fp(
x,
power = 1,
scale = 1,
shift = 0,
name = NULL,
check_binary = TRUE
)
transform_vector_acd(
x,
power = c(1, 1),
shift = 0,
powers = NULL,
scale = 1,
acd_parameter = NULL,
name = NULL
)
Arguments
x |
a vector of a predictor variable. |
power |
a numeric vector indicating the FP power. Default is 1 (linear).
Must be a vector of length 2 for acd transformation. Ignores |
scale |
scaling factor for x of interest. Must be a positive integer
or |
shift |
shift required for shifting x to positive values. Default is 0,
meaning no shift is applied. If |
name |
character used to define names for the output matrix. Default
is |
check_binary |
a logical indicating whether or not input |
powers |
passed to |
acd_parameter |
a list usually returned by |
Details
The fp transformation generally transforms x
as follows. For each pi in
power
= (p1, p2, ..., pn) it creates a variable x^pi and returns the
collection of variables as a matrix. It may process the data using
shifting and scaling as desired. Centering has to be done after the
data is transformed using these functions, if desired.
A special case are repeated powers, i.e. when some pi = pj. In this case, the fp transformations are given by x^pi and x^pi * log(x). In case more than 2 powers are repeated they are repeatedly multiplied with log(x) terms, e.g. pi = pj = pk leads to x^pi, x^pi * log(x) and x^pi * log(x)^2.
Note that the powers pi are assumed to be sorted. That is, this function
sorts them, then proceeds to compute the transformation. For example,
the output will be the same for power = c(1, 1, 2)
and
power = c(1, 2, 1)
. This is done to make sense of repeated powers and
to uniquely define FPs. In case an ACD transformation is used, there is a
specific order in which powers are processed, which is always the same (but
not necessarily sorted).
Thus, throughout the whole package powers will always be given and processed
in either sorted, or ACD specific order and the columns of the matrix
returned by this function will always align with the powers used
throughout this package.
Binary variables are not transformed, unless check_binary
is set to
FALSE
. This is usually not necessary, the only special case to set it to
FALSE
is when a single value is to be transformed during prediction (e.g.
to transform a reference value). When this is done, binary variables are
still returned unchanged, but a single value from a continuous variable will
be transformed as desired by the fitted transformations. For model fit,
check_binary
should always be at its default value.
Value
Returns a matrix of transformed variable(s). The number of columns
depends on the number of powers provided, the number of rows is equal to the
length of x
. The columns are sorted by increased power.
If all powers are NA
, then this function returns NULL
.
In case an acd transformation is applied, the output is a list with two
entries. The first acd
is the matrix of transformed variables, the acd
term is returned as the last column of the matrix (i.e. in case that the
power for the normal data is NA
, then it is the only column in the matrix).
The second entry acd_parameter
returns a list of estimated parameters
for the ACD transformation, or simply the input acd_parameter
if it was
not NULL
.
Functions
-
transform_vector_acd()
: Function to generate acd transformation.
Data processing
An important note on data processing. Variables are shifted and scaled
before being transformed by any powers. That is to ensure positive values
and reasonable scales. Note that scaling does not change the estimated
powers, see also find_scale_factor()
.
However, they may be centered after transformation. This is not done by
these functions.
That is to ensure that the correlation between variables stay intact,
as centering before transformation would affect them. This is described
in Sauerbrei et al (2006), as well as in the Stata manual of mfp
.
Also, centering is not recommended, and should only be done for the final
model if desired.
References
Sauerbrei, W., Meier-Hirmer, C., Benner, A. and Royston, P., 2006. Multivariable regression model building by using fractional polynomials: Description of SAS, STATA and R programs. Comput Stat Data Anal, 50(12): 3464-85.
Examples
z = 1:10
transform_vector_fp(z)
transform_vector_acd(z)