txt_feature {crfsuite} | R Documentation |
Extract basic text features which are useful for entity recognition
Description
Extract basic text features which are useful for entity recognition
Usage
txt_feature(
x,
type = c("is_capitalised", "is_url", "is_email", "is_number", "prefix", "suffix",
"shape"),
n = 4
)
Arguments
x |
a character vector |
type |
a character string, which can be one of 'is_capitalised', 'is_url', 'is_email', 'is_number', 'prefix', 'suffix', 'shape' |
n |
for type 'prefix' or 'suffix', the number of characters of the prefix/suffix |
Value
For type 'is_capitalised', 'is_url', 'is_email', 'is_number': a logical vector of the same length as x
, indicating if x
is capitalised, a url, an email or a number
For type 'prefix', 'suffix': a character vector of the same length as x
, containing the prefix or suffix n
number of characters of x
For type 'shape': a character vector of the same length as x
, where lowercased elements are replaced with x and uppercased elements with X
Examples
txt_feature("Red Devils", type = "is_capitalised")
txt_feature("red devils", type = "is_capitalised")
txt_feature("http://www.bnosac.be", type = "is_url")
txt_feature("info@google.com", type = "is_email")
txt_feature("hi there", type = "is_email")
txt_feature("1230000", type = "is_number")
txt_feature("123.15", type = "is_number")
txt_feature("123,15", type = "is_number")
txt_feature("123abc", type = "is_number")
txt_feature("abcdefghijklmnopqrstuvwxyz", type = "prefix", n = 3)
txt_feature("abcdefghijklmnopqrstuvwxyz", type = "suffix", n = 3)
txt_feature("Red Devils", type = "shape")
txt_feature("red devils", type = "shape")
[Package crfsuite version 0.4.2 Index]