quantize {qgcomp} | R Documentation |
Quantizing exposure data
Description
Create variables representing indicator functions with cutpoints defined
by quantiles. Output a list that includes: 1) a dataset that is a copy of data,
except that the variables whose names are included in the expnms
variable are
transformed to their quantized version and 2) an unnamed list of the quantile cutpoints
that are used for each of the variables that were quantized
Usage
quantize(data, expnms, q = 4, breaks = NULL)
Arguments
data |
a data frame |
expnms |
a character vector with the names of the columns to be quantized |
q |
integer, number of quantiles used in creating quantized variables |
breaks |
(optional) list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints. |
Details
This function creates categorical variables in place of the exposure variables named in 'expnms.' For example, a continuous exposure 'x1' will be replaced in the output data by another 'x1' that takes on values 0:(q-1), where, for example, the value 1 indicates that the original x1 value falls between the first and the second quantile.
Value
A list containing the following fields
- data
a quantized version of the original dataframe
- breaks
a list of the quantile cutpoints used to create the quantized variables which includes a very small number for the minimum and a very large number for the maximum to avoid causing issues when using these breaks to quantize new data.
Examples
set.seed(1232)
dat = data.frame(y=runif(100), x1=runif(100), x2=runif(100), z=runif(100))
qdata = quantize(data=dat, expnms=c("x1", "x2"), q=4)
table(qdata$data$x1)
table(qdata$data$x2)
summary(dat[c("y", "z")]);summary(qdata$data[c("y", "z")]) # not touched
dat = data.frame(y=runif(100), x1=runif(100), x2=runif(100), z=runif(100))
# using 'breaks' requires specifying min and max (the qth quantile)
# example with theoretical quartiles (could be other relevant values)
qdata2 = quantize(data=dat, expnms=c("x1", "x2"),
breaks=list(c(-1e64, .25, .5, .75, 1e64),
c(-1e64, .25, .5, .75, 1e64)
))
table(qdata2$data$x1)
table(qdata2$data$x2)