| cutFancy {rockchalk} | R Documentation |
Create an ordinal variable by grouping numeric data input.
Description
This is a convenience function for usage of R's cut
function. Users can specify cutpoints or category labels or
desired proportions of groups in various ways. In that way, it has
a more flexible interface than cut. It also tries to notice
and correct some common user errors, such as omitting the outer
boundaries from the probs argument. The returned values are
labeled by their midpoints, rather than cut's usual boundaries.
Usage
cutFancy(y, cutpoints = "quantile", probs, categories)
Arguments
y |
The input data from which the categorized variable will be created. |
cutpoints |
Optional paramter, a vector of thresholds at
which to cut the data. If it is not supplied, the default
value |
probs |
This is an optional parameter, relevant only when the
R function |
categories |
Can be a number to designate the number of
sub-groups created, or it can be a vector of names used. If
|
Details
The dividing points, thought of as "thresholds" or "cutpoints",
can be specified in several ways. cutFancy will
automatically create equally-sized sets of observations for a
given number of categories if neither probs nor
cutpoints is specified. The bare minimum input needed is
categories=5, for example, to ask for 5 equally sized
groups. More user control can be had by specifying either
cutpoints or probs. If cutpoints is not
specified at all, or if cutpoints="quantile", then
probs can be used to specify the proportions of the data
points that are to fall within each range. On the other hand, one
can specify cutpoints = "quantile" and then probs will
be used to specify the proportions of the data points that are to
fall within each range.
If categories is not specified, the category names will be
created. Names for ordinal categories will be the numerical
midpoints for the outcomes. Perhaps this will deviate from your
expectation, which might be ordinal categories name "0", "1", "2",
and so forth. The numerically labeled values we provide can be
used in various ways during the analysis process. Read "?factor"
to learn ways to convert the ordinal output to other
formats. Examples include various ways of converting the ordinal
output to numeric.
The categories parameter works together with
cutpoints. cutpoints allows a character string
"quantile". If cutpoints is not specified, or if the user
specifies a character string cutpoints="quantile", then the
probs would be used to determine the cutpoints. However,
if probs is not specified, then the categories
argument can be used. If cutpoints="quantile", then
if
categoriesis one integer, then it is interpreted as the number of "equally sized" categories to be created, or-
categoriescan be a vector of names. The length of the vector is used to determine the number of categories, and the values are put to use as factor labels.
Value
an ordinal vector with attributes "cutpoints" and "props" (proportions)
Examples
set.seed(234234)
y <- rnorm(1000, m = 35, sd = 14)
yord <- cutFancy(y, cutpoints = c(30, 40, 50))
table(yord)
attr(yord, "props")
attr(yord, "cutpoints")
yord <- cutFancy(y, categories = 4L)
table(yord, exclude = NULL)
attr(yord, "props")
attr(yord, "cutpoints")
yord <- cutFancy(y, probs = c(0, .1, .3, .7, .9, 1.0),
categories = c("A", "B", "C", "D", "E"))
table(yord, exclude = NULL)
attr(yord, "props")
attr(yord, "cutpoints")
yord <- cutFancy(y, probs = c(0, .1, .3, .7, .9, 1.0))
table(yord, exclude = NULL)
attr(yord, "props")
attr(yord, "cutpoints")
yasinteger <- as.integer(yord)
table(yasinteger, yord)
yasnumeric <- as.numeric(levels(yord))[yord]
table(yasnumeric, yord)
barplot(attr(yord, "props"))
hist(yasnumeric)
X1a <-
genCorrelatedData3("y ~ 1.1 + 2.1 * x1 + 3 * x2 + 3.5 * x3 + 1.1 * x1:x3",
N = 10000, means = c(x1 = 1, x2 = -1, x3 = 3),
sds = 1, rho = 0.4)
## Create cutpoints from quantiles
probs <- c(.3, .6)
X1a$yord <- cutFancy(X1a$y, probs = probs)
attributes(X1a$yord)
table(X1a$yord, exclude = NULL)