stem_malay {malaytextr}R Documentation

Stemming Malay words

Description

Malaytextr function to stem Malay words

Usage

stem_malay(word,
  dictionary,
  col_feature1,
  col_dict1,
  col_dict2,
  Word)

Arguments

word

A data frame, or a character vector

dictionary

A data frame with a column of words to be stemmed and a column of root words

col_feature1

Column that contains words to be stemmed from word

col_dict1

Column that will be used to match with col_feature1 from word

col_dict2

Column that contains the root words from dictionary

Word

Depreciated. Please use word instead

Format

An object of class function of length 1.

Details

stem_malay() is an approach to find the Malay words in a dictionary and then proceed to remove "extra suffix" as explained by Khan et al. (2017), and then "prefix" and lastly, "suffix".

Value

Returns a data frame with the following properties:

References

Khan, Rehman Ullah, Fitri Suraya Mohamad, Muh Inam UlHaq, Shahren Ahmad Zadi Adruce, Philip Nuli Anding, Sajjad Nawaz Khan, and Abdulrazak Yahya Saleh Al-Hababi. 2017. "Malay Language Stemmer."

Examples


#Specifying a character vector &
#use a dictionary from malaytextr package

stem_malay(word = "banyaknya", dictionary = malayrootwords)



#A data frame,
#Use a dictionary from malaytextr package,
#With a dataframe, you will need to specify the column to be stemmed

x <- data.frame(text = c("banyaknya","sangat","terkedu", "pengetahuan"))

stem_malay(word = x, dictionary = malayrootwords, col_feature1 = "text")


[Package malaytextr version 0.1.3 Index]