readRCV1 {tm} | R Documentation |
Read In a Reuters Corpus Volume 1 Document
Description
Read in a Reuters Corpus Volume 1 XML document.
Usage
readRCV1(elem, language, id)
readRCV1asPlain(elem, language, id)
Arguments
elem |
a named list with the component |
language |
a string giving the language. |
id |
Not used. |
Value
An XMLTextDocument
for readRCV1
, or a
PlainTextDocument
for readRCV1asPlain
, representing the
text and metadata extracted from elem$content
.
References
Lewis, D. D.; Yang, Y.; Rose, T.; and Li, F (2004). RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research, 5, 361–397. https://www.jmlr.org/papers/volume5/lewis04a/lewis04a.pdf
See Also
Reader
for basic information on the reader infrastructure
employed by package tm.
Examples
f <- system.file("texts", "rcv1_2330.xml", package = "tm")
f_bin <- readBin(f, raw(), file.size(f))
rcv1 <- readRCV1(elem = list(content = f_bin), language = "en", id = "id1")
content(rcv1)
meta(rcv1)
[Package tm version 0.7-13 Index]