R: Extract text features for authorship analysis

textFeatures {koRpus}

R Documentation

Extract text features for authorship analysis

Description

This function combines several of koRpus' methods to extract the 9-Feature Set for authorship detection (Brannon, Afroz & Greenstadt, 2011; Brannon & Greenstadt, 2009).

Usage

textFeatures(text, hyphen = NULL)

Arguments

`text`	An object of class `kRp.text`. Can also be a list of these objects, if you want to analyze more than one text at once.
`hyphen`	An object of class `kRp.hyphen`, if `text` has already been hyphenated. If `text` is a list and `hyphen` is not `NULL`, it must also be a list with one object for each text, in the same order.

Value

A data.frame:

uniqWd: Number of unique words (tokens)
cmplx: Complexity (TTR)
sntCt: Sentence count
sntLen: Average sentence length
syllCt: Average syllable count
charCt: Character count (all characters, including spaces)
lttrCt: Letter count (without spaces, punctuation and digits)
FOG: Gunning FOG index
flesch: Flesch Reading Ease index

References

Brennan, M., Afroz, S., & Greenstadt, R. (2011). Deceiving authorship detection. Presentation at 28th Chaos Communication Congress (28C3), Berlin, Germany. Brennan, M. & Greenstadt, R. (2009). Practical Attacks Against Authorship Recognition Techniques. In Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence (IAAI), Pasadena, CA. Tweedie, F.J., Singh, S., & Holmes, D.I. (1996). Neural Network Applications in Stylometry: The Federalist Papers. Computers and the Humanities, 30, 1–10.

Examples

# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
  sample_file <- file.path(
    path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
  )
  tokenized.obj <- tokenize(
    txt=sample_file,
    lang="en"
  )
  textFeatures(tokenized.obj)
} else {}

[Package koRpus version 0.13-8 Index]