filter_segment {jiebaR} | R Documentation |
Filter segmentation result
Description
This function helps remove some words in the segmentation result.
Usage
filter_segment(input, filter_words, unit = 50)
Arguments
input |
a string vector |
filter_words |
a string vector of words to be removed. |
unit |
the length of word unit to use in regular expression, and the default is 50. Long list of a words forms a big regular expressions, it may or may not be accepted: the POSIX standard only requires up to 256 bytes. So we use unit to split the words in units. |
Examples
filter_segment(c("abc","def"," ","."), c("abc"))
[Package jiebaR version 0.11 Index]