filter_segment {jiebaR}R Documentation

Filter segmentation result

Description

This function helps remove some words in the segmentation result.

Usage

filter_segment(input, filter_words, unit = 50)

Arguments

input

a string vector

filter_words

a string vector of words to be removed.

unit

the length of word unit to use in regular expression, and the default is 50. Long list of a words forms a big regular expressions, it may or may not be accepted: the POSIX standard only requires up to 256 bytes. So we use unit to split the words in units.

Examples

filter_segment(c("abc","def"," ","."), c("abc"))

[Package jiebaR version 0.11 Index]