combine.levels {Hmisc} | R Documentation |
combine.levels
Description
Combine Infrequent Levels of a Categorical Variable
Usage
combine.levels(
x,
minlev = 0.05,
m,
ord = is.ordered(x),
plevels = FALSE,
sep = ","
)
Arguments
x |
a factor, 'ordered' factor, or numeric or character variable that will be turned into a 'factor' |
minlev |
the minimum proportion of observations in a cell before that cell is combined with one or more cells. If more than one cell has fewer than minlev*n observations, all such cells are combined into a new cell labeled '"OTHER"'. Otherwise, the lowest frequency cell is combined with the next lowest frequency cell, and the level name is the combination of the two old level levels. When 'ord=TRUE' combinations happen only for consecutive levels. |
m |
alternative to 'minlev', is the minimum number of observations in a cell before it will be combined with others |
ord |
set to 'TRUE' to treat 'x' as if it were an ordered factor, which allows only consecutive levels to be combined |
plevels |
by default 'combine.levels' pools low-frequency levels into a category named 'OTHER' when 'x' is not ordered and 'ord=FALSE'. To instead name this category the concatenation of all the pooled level names, separated by a comma, set 'plevels=TRUE'. |
sep |
the separator for concatenating levels when 'plevels=TRUE' |
Details
After turning 'x' into a 'factor' if it is not one already, combines levels of 'x' whose frequency falls below a specified relative frequency 'minlev' or absolute count 'm'. When 'x' is not treated as ordered, all of the small frequency levels are combined into '"OTHER"', unless 'plevels=TRUE'. When 'ord=TRUE' or 'x' is an ordered factor, only consecutive levels are combined. New levels are constructed by concatenating the levels with 'sep' as a separator. This is useful when comparing ordinal regression with polytomous (multinomial) regression and there are too many categories for polytomous regression. 'combine.levels' is also useful when assumptions of ordinal models are being checked empirically by computing exceedance probabilities for various cutoffs of the dependent variable.
Value
a factor variable, or if 'ord=TRUE' an ordered factor variable
Author(s)
Frank Harrell
Examples
x <- c(rep('A', 1), rep('B', 3), rep('C', 4), rep('D',1), rep('E',1))
combine.levels(x, m=3)
combine.levels(x, m=3, plevels=TRUE)
combine.levels(x, ord=TRUE, m=3)
x <- c(rep('A', 1), rep('B', 3), rep('C', 4), rep('D',1), rep('E',1),
rep('F',1))
combine.levels(x, ord=TRUE, m=3)