R Factor Utilities {qeML} | R Documentation |
R Factor Utilities
Description
Utilities to manipulate R factors, extending the ones in regtools.
Usage
levelCounts(data)
dataToTopLevels(data,lowCountThresholds)
factorToTopLevels(f,lowCountThresh=0)
cartesianFactor(dataName,factorNames,fNameSep = ".")
qeRareLevels(x, yName, yesYVal = NULL)
Arguments
data |
A data frame or equivalent. |
f |
An R factor. |
lowCountThresh |
Factor levels will counts below this value will not be used for this factor. |
lowCountThresholds |
An R list of column names and their
corresponding values of |
dataName |
A quoted name of a data frame or equivalent. |
factorNames |
A vector of R factor names. |
fNameSep |
A character to be used as a delimiter in the names of the levels of the output factor. |
x |
A data frame. |
yName |
Quoted name of the response variable. |
yesYVal |
In the case of binary Y, the factor level to be considered positive. |
Details
Often one has an R factor in which one or more levels are rare in the
data. This could cause problems, say in performing cross-validation; a
level in the test set might be "new," not having appeared in the
training set. Toward this end, factorToTopLevels
will remove
rare levels from a factor; dataToTopLevels
applies this to an
entire data frame.
Also toward this end, the function levelCounts
simply applies
table()
to each column of data
, returning the result as an
R list. (If more than 10 levels, it returns NA.
The function cartesianFactor
generates a "superfactor" from
individual ones; e.g. if factors f1 and f2 have n1 and n2 levels, the
output is a new factor with n1 * n2 levels.
The function qeRareLevels
checks all columns in a data frame in
terms of being an R factor with rare levels.
Author(s)
Norm Matloff
Examples
data(svcensus)
levelCounts(svcensus) # e.g. finds there are 15182 men, 4908 women
f1 <- svcensus$gender # 2 levels
f2 <- svcensus$occ # 6 levels
z <- cartesianFactor('svcensus',c('gender','occ'))
head(z)
# [1] female.102 male.101 female.102 male.100 female.100 male.100
# 12 Levels: female.100 female.101 female.102 female.106 ... male.141