factorsToDummies {regtools}R Documentation

Factor Conversion Utilities

Description

Utilities from converting back and forth between factors and dummy variables.

Usage

xyDataframeToMatrix(xy)
dummiesToInt(dms,inclLast=FALSE)
factorToDummies(f,fname,omitLast=FALSE,factorInfo=NULL)
factorsToDummies(dfr,omitLast=FALSE,factorsInfo=NULL,dfOut=FALSE)
dummiesToFactor(dms,inclLast=FALSE) 
charsToFactors(dtaf)
factorTo012etc(f,earlierLevels = NULL)
discretize(x,endpts)
getDFclasses(dframe)
hasCharacters(dfr)
hasFactors(x)
toAllNumeric(w,factorsInfo=NULL)
toSubFactor(f,saveLevels,lumpedLevel="zzzOther")
toSuperFactor(inFactor,superLevels)

Arguments

dfOut

If TRUE, return a data frame, otherwise a matrix.

dms

Matrix or data frame of dummy columns.

inclLast

When forming a factor from dummies, include the last dummy as a level if this is TRUE.

xy

A data frame mentioned for prediction, "Y" in last column.

saveLevels

In collapsing a factor, which levels to retain.

lumpedLevel

Name of new level to be created from levels not retained.

x

A numeric vector, except in hasFactors, where it is a data frame.

endpts

Vector to be used as breaks in call to cut. To avoid NAs, range of the vector must cover the range of the input vector.

f

A factor.

inFactor

Original factor, to be extended.

superLevels

New levels to be added to the original factor.

earlierLevels

Previous levels found for this factor.

fname

A factor name.

dfr

A data frame.

w

A data frame.

dframe

A data frame, for which we wish to find the column classes.

omitLast

If TRUE, then generate only k-1 dummies from k factor levels.

factorsInfo

Attribute from output of factorsToDummies.

factorInfo

Attribute from output of factorToDummies.

dtaf

A data frame.

Details

Many R users prefer to express categorical data as R factors, or often work with data that is of this type to begin with. On the other hand, many regression packages, e.g. lars, disallow factors. These utilities facilitate conversion from one form to another.

Here is an overview of the roles of the various functions:

The optional argument factorsInfo is intended for use in prediction contexts. Typically a set of new cases will not have all levels of the factor in the training set. Without this argument, only an incomplete set of dummies would be generated for the set of new cases.

A key point about changing factors to dummies is that, for later prediction after fitting a model in our training set, one needs to use the same transformations. Say a factor has levels 'abc', 'de' and 'f' (and omitLast = FALSE). If we later have a set of say two new cases to predict, and their values for this factor are 'de' and 'f', we would generate dummies for them but not for 'abc', incompatible with the three dummies used in the training set.

Thus the factor names and levels are saved in attributes, and can be used as input: The relations are as follows:

Other functions:

Value

The function factorToDummies returns a matrix of dummy variables, while factorsToDummies returns a new version of the input data frame, in which each factor is replaced by columns of dummies. The function factorToDummies is similar, but changes character vectors to factors.

Author(s)

Norm Matloff

Examples

x <- factor(c('abc','de','f','de'))
xd <- factorToDummies(x,'x')  
xd 
#      x.abc x.de
# [1,]     1    0
# [2,]     0    1
# [3,]     0    0
# [4,]     0    1
# attr(,"factorInfo")
# attr(,"factorInfo")$fname
# [1] "x"
# 
# attr(,"factorInfo")$omitLast
# [1] TRUE
# 
# attr(,"factorInfo")$fullLvls
# [1] "abc" "de"  "f"  
w <- factor(c('de','abc','abc'))
wd <- factorToDummies(w,'x',factorInfo=attr(xd,'factorInfo')) 
wd 
#      x.abc x.de
# [1,]     0    1
# [2,]     1    0
# [3,]     1    0
# attr(,"factorInfo")
# attr(,"factorInfo")$fname
# [1] "x"
# 
# attr(,"factorInfo")$omitLast
# [1] TRUE
# 
# attr(,"factorInfo")$fullLvls
# [1] "abc" "de"  "f"  


[Package regtools version 1.7.0 Index]