stem_malay {malaytextr} | R Documentation |
Stemming Malay words
Description
Malaytextr function to stem Malay words
Usage
stem_malay(word,
dictionary,
col_feature1,
col_dict1,
col_dict2,
Word)
Arguments
word |
A data frame, or a character vector |
dictionary |
A data frame with a column of words to be stemmed and a column of root words |
col_feature1 |
Column that contains words to be stemmed from |
col_dict1 |
Column that will be used to match with |
col_dict2 |
Column that contains the root words from |
Word |
Depreciated. Please use |
Format
An object of class function
of length 1.
Details
stem_malay()
is an approach to find the Malay words in a dictionary
and then proceed to remove "extra suffix" as explained by Khan et al. (2017), and then "prefix" and lastly, "suffix".
Value
Returns a data frame with the following properties:
-
Col Word
: Renamed input fromword
-
Root Word
: An additional column which contains the word(s) after being stemmed.
References
Khan, Rehman Ullah, Fitri Suraya Mohamad, Muh Inam UlHaq, Shahren Ahmad Zadi Adruce, Philip Nuli Anding, Sajjad Nawaz Khan, and Abdulrazak Yahya Saleh Al-Hababi. 2017. "Malay Language Stemmer."
Examples
#Specifying a character vector &
#use a dictionary from malaytextr package
stem_malay(word = "banyaknya", dictionary = malayrootwords)
#A data frame,
#Use a dictionary from malaytextr package,
#With a dataframe, you will need to specify the column to be stemmed
x <- data.frame(text = c("banyaknya","sangat","terkedu", "pengetahuan"))
stem_malay(word = x, dictionary = malayrootwords, col_feature1 = "text")