R: Stemming Malay words

stem_malay {malaytextr}

R Documentation

Stemming Malay words

Description

Malaytextr function to stem Malay words

Usage

stem_malay(word,
  dictionary,
  col_feature1,
  col_dict1,
  col_dict2,
  Word)

Arguments

`word`	A data frame, or a character vector
`dictionary`	A data frame with a column of words to be stemmed and a column of root words
`col_feature1`	Column that contains words to be stemmed from `word`
`col_dict1`	Column that will be used to match with `col_feature1` from `word`
`col_dict2`	Column that contains the root words from `dictionary`
`Word`	Depreciated. Please use `word` instead

Format

An object of class function of length 1.

Details

stem_malay() is an approach to find the Malay words in a dictionary and then proceed to remove "extra suffix" as explained by Khan et al. (2017), and then "prefix" and lastly, "suffix".

Value

Returns a data frame with the following properties:

⁠Col Word⁠: Renamed input from word
⁠Root Word⁠: An additional column which contains the word(s) after being stemmed.

References

Khan, Rehman Ullah, Fitri Suraya Mohamad, Muh Inam UlHaq, Shahren Ahmad Zadi Adruce, Philip Nuli Anding, Sajjad Nawaz Khan, and Abdulrazak Yahya Saleh Al-Hababi. 2017. "Malay Language Stemmer."

Examples


#Specifying a character vector &
#use a dictionary from malaytextr package

stem_malay(word = "banyaknya", dictionary = malayrootwords)



#A data frame,
#Use a dictionary from malaytextr package,
#With a dataframe, you will need to specify the column to be stemmed

x <- data.frame(text = c("banyaknya","sangat","terkedu", "pengetahuan"))

stem_malay(word = x, dictionary = malayrootwords, col_feature1 = "text")

[Package malaytextr version 0.1.3 Index]