R: Extract basic text features which are useful for entity...

txt_feature {crfsuite}

R Documentation

Extract basic text features which are useful for entity recognition

Description

Extract basic text features which are useful for entity recognition

Usage

txt_feature(
  x,
  type = c("is_capitalised", "is_url", "is_email", "is_number", "prefix", "suffix",
    "shape"),
  n = 4
)

Arguments

`x`	a character vector
`type`	a character string, which can be one of 'is_capitalised', 'is_url', 'is_email', 'is_number', 'prefix', 'suffix', 'shape'
`n`	for type 'prefix' or 'suffix', the number of characters of the prefix/suffix

Value

For type 'is_capitalised', 'is_url', 'is_email', 'is_number': a logical vector of the same length as x, indicating if x is capitalised, a url, an email or a number
For type 'prefix', 'suffix': a character vector of the same length as x, containing the prefix or suffix n number of characters of x
For type 'shape': a character vector of the same length as x, where lowercased elements are replaced with x and uppercased elements with X

Examples

txt_feature("Red Devils", type = "is_capitalised")
txt_feature("red devils", type = "is_capitalised")
txt_feature("http://www.bnosac.be", type = "is_url")
txt_feature("info@google.com", type = "is_email")
txt_feature("hi there", type = "is_email")
txt_feature("1230000", type = "is_number")
txt_feature("123.15", type = "is_number")
txt_feature("123,15", type = "is_number")
txt_feature("123abc", type = "is_number")
txt_feature("abcdefghijklmnopqrstuvwxyz", type = "prefix", n = 3)
txt_feature("abcdefghijklmnopqrstuvwxyz", type = "suffix", n = 3)
txt_feature("Red Devils", type = "shape")
txt_feature("red devils", type = "shape")

[Package crfsuite version 0.4.2 Index]