stem_snowball {corpus} | R Documentation |
Stem a set of terms using one of the algorithms provided by the Snowball stemming library.
stem_snowball(x, algorithm = "en")
x |
character vector of terms to stem. |
algorithm |
stemming algorithm; see ‘Details’ for the valid choices. |
Apply a Snowball stemming algorithm to a vector of input terms, x
,
returning the result in a character vector of the same length with the
same names.
The algorithm
argument specifies the stemming algorithm. Valid choices
include the following:
"ar"
("arabic"
),
"da"
("danish"
),
"de"
("german"
),
"en"
("english"
),
"es"
("spanish"
),
"fi"
("finnish"
),
"fr"
("french"
),
"hu"
("hungarian"
),
"it"
("italian"
),
"nl"
("dutch"
),
"no"
("norwegian"
),
"pt"
("portuguese"
),
"ro"
("romanian"
),
"ru"
("russian"
),
"sv"
("swedish"
),
"ta"
("tamil"
),
"tr"
("turkish"
),
and "porter"
.
Setting algorithm = NULL
gives a stemmer that returns its input
unchanged.
The function only stems single-word terms of kind "letter"; it leaves other inputs (multi-word terms, and terms of kind "number", "punct", and "symbol") unchanged.
The Snowball stemming library
provides the underlying implementation. The wordStem
function from
the SnowballC package provides a similar interface, but that function
applies the algorithm to all input terms, regardless of the kind of the term.
A character vector the same length and names as the input, x
, with
entries containing the corresponding stems.
# apply english stemming algorithm; don't stem non-letter terms stem_snowball(c("win", "winning", "winner", "#winning")) # compare with SnowballC, which stems all kinds, not just letter ## Not run: SnowballC::wordStem(c("win", "winning", "winner", "#winning"), "en")