| dataset_trec {textdata} | R Documentation | 
TREC dataset
Description
The TREC dataset is dataset for question classification consisting of open-domain, fact-based questions divided into broad semantic categories. It has both a six-class (TREC-6) and a fifty-class (TREC-50) version. Both have 5,452 training examples and 500 test examples, but TREC-50 has finer-grained labels. Models are evaluated based on accuracy.
Usage
dataset_trec(
  dir = NULL,
  split = c("train", "test"),
  version = c("6", "50"),
  delete = FALSE,
  return_path = FALSE,
  clean = FALSE,
  manual_download = FALSE
)
Arguments
dir | 
 Character, path to directory where data will be stored. If
  | 
split | 
 Character. Return training ("train") data or testing ("test") data. Defaults to "train".  | 
version | 
 Character. Version 6("6") or version 50("50"). Defaults to "6".  | 
delete | 
 Logical, set   | 
return_path | 
 Logical, set   | 
clean | 
 Logical, set   | 
manual_download | 
 Logical, set   | 
Details
The classes in TREC-6 are
ABBR - Abbreviation
DESC - Description and abstract concepts
ENTY - Entities
HUM - Human beings
LOC - Locations
NYM - Numeric values
the classes in TREC-50 can be found here https://cogcomp.seas.upenn.edu/Data/QA/QC/definition.html.
Value
A tibble with 5,452 or 500 rows for "train" and "test" respectively and 2 variables:
- class
 Character, denoting the class
- text
 Character, question text
Source
https://cogcomp.seas.upenn.edu/Data/QA/QC/
https://trec.nist.gov/data/qa.html
See Also
Other topic: 
dataset_ag_news(),
dataset_dbpedia()
Examples
## Not run: 
dataset_trec()
# Custom directory
dataset_trec(dir = "data/")
# Deleting dataset
dataset_trec(delete = TRUE)
# Returning filepath of data
dataset_trec(return_path = TRUE)
# Access both training and testing dataset
train_6 <- dataset_trec(split = "train")
test_6 <- dataset_trec(split = "test")
train_50 <- dataset_trec(split = "train", version = "50")
test_50 <- dataset_trec(split = "test", version = "50")
## End(Not run)