PerStem {PersianStemmer}R Documentation

Persian Stemmer for Text Analysis

Description

Stems Persian texts for text analysis.

Usage

PerStem(dat, NoEnglish = TRUE, NoNumbers = TRUE, 
	NoStopwords = TRUE, NoPunctuation = TRUE, 
	StemVerbs = TRUE, NoPreSuffix = TRUE, 
	Context = TRUE, StemBrokenPlurals = TRUE, 
	Transliteration = TRUE)

Arguments

dat

The original data.

NoEnglish

Removes English characters.

NoNumbers

Removes numbers.

NoStopwords

Removes stopwords by using the default stopword list.

NoPunctuation

If TRUE the function removes punctuation. If FALSE, it fixes punctuation for text analysis.

StemVerbs

Performs stemming on verbs and returns past or present root of the verb.

NoPreSuffix

Performs stemming by removing prefixes and suffixes.

Context

If TRUE, the function performs stemming on a word only if its stem exists in text. If FALSE, the function performs stemming without considering other words in text.

StemBrokenPlurals

Performs stemming on Arabic broken plurals and return singulars by using the default Arabic broken plurals list.

Transliteration

Transliterates Persian unicode characters into Latin characters using a transliteration system developed by Roozbeh Safshekan and Rich Nielsen.

Details

PerStem prepares texts in Persian for text analysis by stemming.

Value

PerStem returns the stemmed Persian text.

Author(s)

Roozbeh Safshekan, Richard Nielsen

Examples

# Load data
data(UniversityofTehran)

# Stem and transliterate the text
PerStem(UniversityofTehran,NoEnglish=TRUE, NoNumbers= TRUE, 
                    NoStopwords=TRUE, NoPunctuation= TRUE,
                    StemVerbs = TRUE, NoPreSuffix= TRUE, Context = TRUE,
                    StemBrokenPlurals=TRUE,Transliteration= TRUE)

[Package PersianStemmer version 1.0 Index]