removeStopWords {arabicStemR}R Documentation

Remove Arabic stopwords.

Description

Defines a list of Arabic-language stopwords and removes them from a string.

Usage

removeStopWords(texts, defaultStopwordList=TRUE, customStopwordList=NULL)

Arguments

texts

A string from which Arabic stopwords should be removed.

defaultStopwordList

If TRUE, use the default stopword list of words to be removed. If FALSE, do not use the default stopword list. Default is TRUE.

customStopwordList

Optional user-specified stopword list of words to be removed, supplied as a vector of strings in either Arabic UTF-8 or Latin characters following the stemmer's transliteration scheme (words without Arabic UTF-8 characters are processed with reverse.transliterate()). Default is NULL.

Value

Returns a string with Arabic stopwords removed.

Author(s)

Rich Nielsen

Examples

## Create string with Arabic characters

x <- '\u0627\u0647\u0644\u0627 \u0648\u0633\u0647\u0644\u0627
 \u064a\u0627  \u0635\u062f\u064a\u0642\u064a'

## Remove stop words
removeStopWords(x)$text

## Not run
## To see the full list of stop words 
## removeStopWords(x)$arabicStopwordList


[Package arabicStemR version 1.2 Index]