quantize {qgcomp}R Documentation

Quantizing exposure data

Description

Create variables representing indicator functions with cutpoints defined by quantiles. Output a list that includes: 1) a dataset that is a copy of data, except that the variables whose names are included in the expnms variable are transformed to their quantized version and 2) an unnamed list of the quantile cutpoints that are used for each of the variables that were quantized

Usage

quantize(data, expnms, q = 4, breaks = NULL)

Arguments

data

a data frame

expnms

a character vector with the names of the columns to be quantized

q

integer, number of quantiles used in creating quantized variables

breaks

(optional) list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints.

Details

This function creates categorical variables in place of the exposure variables named in 'expnms.' For example, a continuous exposure 'x1' will be replaced in the output data by another 'x1' that takes on values 0:(q-1), where, for example, the value 1 indicates that the original x1 value falls between the first and the second quantile.

Value

A list containing the following fields

data

a quantized version of the original dataframe

breaks

a list of the quantile cutpoints used to create the quantized variables which includes a very small number for the minimum and a very large number for the maximum to avoid causing issues when using these breaks to quantize new data.

Examples

set.seed(1232)
dat = data.frame(y=runif(100), x1=runif(100), x2=runif(100), z=runif(100))
qdata = quantize(data=dat, expnms=c("x1", "x2"), q=4)
table(qdata$data$x1)
table(qdata$data$x2)
summary(dat[c("y", "z")]);summary(qdata$data[c("y", "z")]) # not touched
dat = data.frame(y=runif(100), x1=runif(100), x2=runif(100), z=runif(100))
# using 'breaks' requires specifying min and max (the qth quantile)
# example with theoretical quartiles (could be other relevant values)
qdata2 = quantize(data=dat, expnms=c("x1", "x2"),
   breaks=list(c(-1e64, .25, .5, .75, 1e64),
               c(-1e64, .25, .5, .75, 1e64)
               ))
table(qdata2$data$x1)
table(qdata2$data$x2)

[Package qgcomp version 2.15.2 Index]