R: Remove Arabic prefixes

removePrefixes {arabicStemR}

R Documentation

Remove Arabic prefixes

Description

Removes some Arabic prefixes from a unicode string. The prefixes are: "waw", "alif-lam", "waw-alif-lam", "ba-alif-lam", "kaf-alif-lam", "fa-alif-lam", and "lam-lam." Prefixes are removed from a word (as defined by spaces) only if the remaining stem would not be too short.

Usage

removePrefixes(texts, x1 = 4, x2 = 4, x3 = 5, x4 = 5, x5 = 5, x6 = 5, x7 = 4, 
dontstem = c('\u0627\u0644\u0644\u0647','u0644\u0644\u0647'))

Arguments

`texts`	An Arabic-language string in unicode
`x1`	The number of letters that must be in a word for the function to remove the prefix "waw".
`x2`	The number of letters that must be in a word for the function to remove the prefix "alif-lam".
`x3`	The number of letters that must be in a word for the function to remove the prefix "waw-alif-lam".
`x4`	The number of letters that must be in a word for the function to remove the prefix "ba-alif-lam".
`x5`	The number of letters that must be in a word for the function to remove the prefix "kaf-alif-lam".
`x6`	The number of letters that must be in a word for the function to remove the prefix "fa-alif-lam".
`x7`	The number of letters that must be in a word for the function to remove the prefix "lam-lam".
`dontstem`	Words that should not be stemmed (entered in unicode).

Value

Returns a string with Arabic prefixes removed.

Author(s)

Rich Nielsen

Examples

## Create string with Arabic characters

x <- '\u0627\u0644\u0644\u063a\u0629 \u0627\u0644\u0639\u0631\u0628\u064a\u0629
 \u062c\u0645\u064a\u0644\u0629 \u062c\u062f\u0627'

# Remove Prefixes

removePrefixes(x)

[Package arabicStemR version 1.3 Index]