textFeatures {koRpus} | R Documentation |
Extract text features for authorship analysis
Description
This function combines several of koRpus
' methods to extract the 9-Feature Set for
authorship detection (Brannon, Afroz & Greenstadt, 2011; Brannon & Greenstadt, 2009).
Usage
textFeatures(text, hyphen = NULL)
Arguments
text |
An object of class |
hyphen |
An object of class |
Value
A data.frame:
- uniqWd
Number of unique words (tokens)
- cmplx
Complexity (TTR)
- sntCt
Sentence count
- sntLen
Average sentence length
- syllCt
Average syllable count
- charCt
Character count (all characters, including spaces)
- lttrCt
Letter count (without spaces, punctuation and digits)
- FOG
Gunning FOG index
- flesch
Flesch Reading Ease index
References
Brennan, M., Afroz, S., & Greenstadt, R. (2011). Deceiving authorship detection. Presentation at 28th Chaos Communication Congress (28C3), Berlin, Germany. Brennan, M. & Greenstadt, R. (2009). Practical Attacks Against Authorship Recognition Techniques. In Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence (IAAI), Pasadena, CA. Tweedie, F.J., Singh, S., & Holmes, D.I. (1996). Neural Network Applications in Stylometry: The Federalist Papers. Computers and the Humanities, 30, 1–10.
Examples
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
sample_file <- file.path(
path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
)
tokenized.obj <- tokenize(
txt=sample_file,
lang="en"
)
textFeatures(tokenized.obj)
} else {}