PerStem {PersianStemmer} | R Documentation |
Persian Stemmer for Text Analysis
Description
Stems Persian texts for text analysis.
Usage
PerStem(dat, NoEnglish = TRUE, NoNumbers = TRUE,
NoStopwords = TRUE, NoPunctuation = TRUE,
StemVerbs = TRUE, NoPreSuffix = TRUE,
Context = TRUE, StemBrokenPlurals = TRUE,
Transliteration = TRUE)
Arguments
dat |
The original data. |
NoEnglish |
Removes English characters. |
NoNumbers |
Removes numbers. |
NoStopwords |
Removes stopwords by using the default stopword list. |
NoPunctuation |
If TRUE the function removes punctuation. If FALSE, it fixes punctuation for text analysis. |
StemVerbs |
Performs stemming on verbs and returns past or present root of the verb. |
NoPreSuffix |
Performs stemming by removing prefixes and suffixes. |
Context |
If TRUE, the function performs stemming on a word only if its stem exists in text. If FALSE, the function performs stemming without considering other words in text. |
StemBrokenPlurals |
Performs stemming on Arabic broken plurals and return singulars by using the default Arabic broken plurals list. |
Transliteration |
Transliterates Persian unicode characters into Latin characters using a transliteration system developed by Roozbeh Safshekan and Rich Nielsen. |
Details
PerStem
prepares texts in Persian for text analysis by stemming.
Value
PerStem
returns the stemmed Persian text.
Author(s)
Roozbeh Safshekan, Richard Nielsen
Examples
# Load data
data(UniversityofTehran)
# Stem and transliterate the text
PerStem(UniversityofTehran,NoEnglish=TRUE, NoNumbers= TRUE,
NoStopwords=TRUE, NoPunctuation= TRUE,
StemVerbs = TRUE, NoPreSuffix= TRUE, Context = TRUE,
StemBrokenPlurals=TRUE,Transliteration= TRUE)