factorsToDummies {regtools} | R Documentation |
Factor Conversion Utilities
Description
Utilities from converting back and forth between factors and dummy variables.
Usage
xyDataframeToMatrix(xy)
dummiesToInt(dms,inclLast=FALSE)
factorToDummies(f,fname,omitLast=FALSE,factorInfo=NULL)
factorsToDummies(dfr,omitLast=FALSE,factorsInfo=NULL,dfOut=FALSE)
dummiesToFactor(dms,inclLast=FALSE)
charsToFactors(dtaf)
factorTo012etc(f,earlierLevels = NULL)
discretize(x,endpts)
getDFclasses(dframe)
hasCharacters(dfr)
hasFactors(x)
toAllNumeric(w,factorsInfo=NULL)
toSubFactor(f,saveLevels,lumpedLevel="zzzOther")
toSuperFactor(inFactor,superLevels)
Arguments
dfOut |
If TRUE, return a data frame, otherwise a matrix. |
dms |
Matrix or data frame of dummy columns. |
inclLast |
When forming a factor from dummies, include the last dummy as a level if this is TRUE. |
xy |
A data frame mentioned for prediction, "Y" in last column. |
saveLevels |
In collapsing a factor, which levels to retain. |
lumpedLevel |
Name of new level to be created from levels not retained. |
x |
A numeric vector, except in |
endpts |
Vector to be used as |
f |
A factor. |
inFactor |
Original factor, to be extended. |
superLevels |
New levels to be added to the original factor. |
earlierLevels |
Previous levels found for this factor. |
fname |
A factor name. |
dfr |
A data frame. |
w |
A data frame. |
dframe |
A data frame, for which we wish to find the column classes. |
omitLast |
If TRUE, then generate only k-1 dummies from k factor levels. |
factorsInfo |
Attribute from output of |
factorInfo |
Attribute from output of |
dtaf |
A data frame. |
Details
Many R users prefer to express categorical data as R factors, or often work with data that is of this type to begin with. On the other hand, many regression packages, e.g. lars, disallow factors. These utilities facilitate conversion from one form to another.
Here is an overview of the roles of the various functions:
-
factorToDummies
: Convert one factor to dummies, yielding a matrix of dummies corresponding to that factor. -
factorsToDummies
: Convert all factors to dummies, yielding a matrix of dummies, corresponding to all factors in the input data frame. -
dummiesToFactor
: Convert a set of related dummies to a factor. -
factorTo012etc
: Convert a factor to a numeric code, starting at 0. -
dummiesToInt
: Convert a related set of dummies to a numeric code, starting at 0. -
charsToFactors
: Convert all character columns in a data frame to factors. -
toAllNumeric
: Convert all factors in a data frame to dummies, yielding a new version of the data frame, including its original nonfactor columns. -
toSubFactor
: Coalesce some levels of a factor, yielding a new factor. -
toSuperFactor
: Add levels to a factor. Typically used in prediction contexts, in which a factor in a data point to be predicted does not have all the levels of the same factor in the training set.\item
xyDataframeToMatrix
: Given a data frame to be used in a training set, with "Y" a factor in the last column, change to all numeric, with dummies in place of all "X" factors and in place of the "Y" factor.
The optional argument factorsInfo
is intended for use in prediction
contexts. Typically a set of new cases will not have all levels of the
factor in the training set. Without this argument, only an incomplete
set of dummies would be generated for the set of new cases.
A key point about changing factors to dummies is that, for later
prediction after fitting a model in our training set, one needs to use
the same transformations. Say a factor has levels 'abc', 'de' and 'f'
(and omitLast = FALSE
). If we later have a set of say two new
cases to predict, and their values for this factor are 'de' and 'f', we
would generate dummies for them but not for 'abc', incompatible with the
three dummies used in the training set.
Thus the factor names and levels are saved in attributes, and can be used as input: The relations are as follows:
-
factorsToDummies
callsfactorToDummies
on each factor it finds in its input data frame -
factorToDummies
outputs and later inputsfactorsInfo
-
factorsToDummies
outputs and later inputsfactorsInfo
Other functions:
-
getDFclasses
: Return a vector of the classes of the columns of a data frame. -
discretize
: Partition range of a vector into (not necessarily equal-length) intervals, and construct a factor from the labels of the intervals that the input elements fall into. -
hasCharacters, hasFactors
: Logical scalars, TRUE if the input data frame has any character or factor columns.
Value
The function factorToDummies
returns a matrix of dummy
variables, while factorsToDummies
returns a new version of the
input data frame, in which each factor is replaced by columns of
dummies. The function factorToDummies
is similar, but changes
character vectors to factors.
Author(s)
Norm Matloff
Examples
x <- factor(c('abc','de','f','de'))
xd <- factorToDummies(x,'x')
xd
# x.abc x.de
# [1,] 1 0
# [2,] 0 1
# [3,] 0 0
# [4,] 0 1
# attr(,"factorInfo")
# attr(,"factorInfo")$fname
# [1] "x"
#
# attr(,"factorInfo")$omitLast
# [1] TRUE
#
# attr(,"factorInfo")$fullLvls
# [1] "abc" "de" "f"
w <- factor(c('de','abc','abc'))
wd <- factorToDummies(w,'x',factorInfo=attr(xd,'factorInfo'))
wd
# x.abc x.de
# [1,] 0 1
# [2,] 1 0
# [3,] 1 0
# attr(,"factorInfo")
# attr(,"factorInfo")$fname
# [1] "x"
#
# attr(,"factorInfo")$omitLast
# [1] TRUE
#
# attr(,"factorInfo")$fullLvls
# [1] "abc" "de" "f"