cutFancy {rockchalk} | R Documentation |
Create an ordinal variable by grouping numeric data input.
Description
This is a convenience function for usage of R's cut
function. Users can specify cutpoints or category labels or
desired proportions of groups in various ways. In that way, it has
a more flexible interface than cut
. It also tries to notice
and correct some common user errors, such as omitting the outer
boundaries from the probs argument. The returned values are
labeled by their midpoints, rather than cut's usual boundaries.
Usage
cutFancy(y, cutpoints = "quantile", probs, categories)
Arguments
y |
The input data from which the categorized variable will be created. |
cutpoints |
Optional paramter, a vector of thresholds at
which to cut the data. If it is not supplied, the default
value |
probs |
This is an optional parameter, relevant only when the
R function |
categories |
Can be a number to designate the number of
sub-groups created, or it can be a vector of names used. If
|
Details
The dividing points, thought of as "thresholds" or "cutpoints",
can be specified in several ways. cutFancy
will
automatically create equally-sized sets of observations for a
given number of categories if neither probs
nor
cutpoints
is specified. The bare minimum input needed is
categories=5
, for example, to ask for 5 equally sized
groups. More user control can be had by specifying either
cutpoints
or probs
. If cutpoints
is not
specified at all, or if cutpoints="quantile"
, then
probs
can be used to specify the proportions of the data
points that are to fall within each range. On the other hand, one
can specify cutpoints = "quantile"
and then probs
will
be used to specify the proportions of the data points that are to
fall within each range.
If categories
is not specified, the category names will be
created. Names for ordinal categories will be the numerical
midpoints for the outcomes. Perhaps this will deviate from your
expectation, which might be ordinal categories name "0", "1", "2",
and so forth. The numerically labeled values we provide can be
used in various ways during the analysis process. Read "?factor"
to learn ways to convert the ordinal output to other
formats. Examples include various ways of converting the ordinal
output to numeric.
The categories
parameter works together with
cutpoints
. cutpoints
allows a character string
"quantile". If cutpoints
is not specified, or if the user
specifies a character string cutpoints="quantile"
, then the
probs
would be used to determine the cutpoints. However,
if probs
is not specified, then the categories
argument can be used. If cutpoints="quantile"
, then
if
categories
is one integer, then it is interpreted as the number of "equally sized" categories to be created, or-
categories
can be a vector of names. The length of the vector is used to determine the number of categories, and the values are put to use as factor labels.
Value
an ordinal vector with attributes "cutpoints" and "props" (proportions)
Examples
set.seed(234234)
y <- rnorm(1000, m = 35, sd = 14)
yord <- cutFancy(y, cutpoints = c(30, 40, 50))
table(yord)
attr(yord, "props")
attr(yord, "cutpoints")
yord <- cutFancy(y, categories = 4L)
table(yord, exclude = NULL)
attr(yord, "props")
attr(yord, "cutpoints")
yord <- cutFancy(y, probs = c(0, .1, .3, .7, .9, 1.0),
categories = c("A", "B", "C", "D", "E"))
table(yord, exclude = NULL)
attr(yord, "props")
attr(yord, "cutpoints")
yord <- cutFancy(y, probs = c(0, .1, .3, .7, .9, 1.0))
table(yord, exclude = NULL)
attr(yord, "props")
attr(yord, "cutpoints")
yasinteger <- as.integer(yord)
table(yasinteger, yord)
yasnumeric <- as.numeric(levels(yord))[yord]
table(yasnumeric, yord)
barplot(attr(yord, "props"))
hist(yasnumeric)
X1a <-
genCorrelatedData3("y ~ 1.1 + 2.1 * x1 + 3 * x2 + 3.5 * x3 + 1.1 * x1:x3",
N = 10000, means = c(x1 = 1, x2 = -1, x3 = 3),
sds = 1, rho = 0.4)
## Create cutpoints from quantiles
probs <- c(.3, .6)
X1a$yord <- cutFancy(X1a$y, probs = probs)
attributes(X1a$yord)
table(X1a$yord, exclude = NULL)