R: Persian Stemmer for Text Analysis

PerStem {PersianStemmer}

R Documentation

Persian Stemmer for Text Analysis

Description

Stems Persian texts for text analysis.

Usage

PerStem(dat, NoEnglish = TRUE, NoNumbers = TRUE, 
	NoStopwords = TRUE, NoPunctuation = TRUE, 
	StemVerbs = TRUE, NoPreSuffix = TRUE, 
	Context = TRUE, StemBrokenPlurals = TRUE, 
	Transliteration = TRUE)

Arguments

`dat`	The original data.
`NoEnglish`	Removes English characters.
`NoNumbers`	Removes numbers.
`NoStopwords`	Removes stopwords by using the default stopword list.
`NoPunctuation`	If TRUE the function removes punctuation. If FALSE, it fixes punctuation for text analysis.
`StemVerbs`	Performs stemming on verbs and returns past or present root of the verb.
`NoPreSuffix`	Performs stemming by removing prefixes and suffixes.
`Context`	If TRUE, the function performs stemming on a word only if its stem exists in text. If FALSE, the function performs stemming without considering other words in text.
`StemBrokenPlurals`	Performs stemming on Arabic broken plurals and return singulars by using the default Arabic broken plurals list.
`Transliteration`	Transliterates Persian unicode characters into Latin characters using a transliteration system developed by Roozbeh Safshekan and Rich Nielsen.

Details

PerStem prepares texts in Persian for text analysis by stemming.

Value

PerStem returns the stemmed Persian text.

Author(s)

Roozbeh Safshekan, Richard Nielsen

Examples

# Load data
data(UniversityofTehran)

# Stem and transliterate the text
PerStem(UniversityofTehran,NoEnglish=TRUE, NoNumbers= TRUE, 
                    NoStopwords=TRUE, NoPunctuation= TRUE,
                    StemVerbs = TRUE, NoPreSuffix= TRUE, Context = TRUE,
                    StemBrokenPlurals=TRUE,Transliteration= TRUE)

[Package PersianStemmer version 1.0 Index]