categorize {divDyn} | R Documentation |
Mapping multiple entries to categories
Description
This basic function replaces groups of values in a vector with single values with the help of a key object.
Usage
categorize(x, key, incbound = "lower")
Arguments
x |
|
key |
|
incbound |
|
Details
Online datasets usually contain overly detailed information, as enterers intend to conserve as much data in the entry process, as possible. However, in analyses some values are treated to represent the same, less-detailed information, which is then used in further procedures. The map
function allows users to do this type of multiple replacement using a specific object called a 'key'
.
A key
is an informal class and is essentially a list
of vectors
. In the case of character
vectors as x
, each vector element in the list
corresponds to a set of entries in x
. These will be replaced by the name of the vector
in the list
, to indicate their assumed identity.
In the case of numeric
x
vectors, if the list
elements of the key
are numeric
vectors with 2 values, then this vector will be treated as an interval. The same value will be assigned to the entries that are in this interval (Example 2). If x
contains values that form the boundary of an interval, than either only the one of the two boundary values can be considered to be in the interval (see the incbound
argument to set which of the two).
The elements of key
are looped through in sequence. If values of x
occur in multiple elements of key
, than the last one will be used (Example 3).
Examples of this data type have been included (keys
) to help process Paleobiology Database occurrences.
Value
A vector with replacements.
Examples
# Example 1
# x, as character
set.seed(1000)
toReplace <- sample(letters[1:6], 15, replace=TRUE)
# a and b should mean 'first', c and d 'second' others: NA
key<-list(first=c("a", "b"), second=c("c", "d"), default=NA)
# do the replacement
categorize(toReplace, key)
# Example 2 - numeric entries and mixed types
# basic vector to be grouped
toReplace2<-1:16
# replacement rules: 5,6,7,8,9 should be "more", 11 should be "eleven" the rest: "other"
key2<-list(default="other", more=c(5,10),eleven=11)
categorize(toReplace2, key2)
# Example 3 - multiple occurrences of same values
# a and b should mean first, a and should mean 'second' others: NA
key3<-list(first=c("a", "b"), second=c("a", "d"), default=NA)
# do the replacement (all "a" entries will be replaced with "second")
categorize(toReplace, key3)