dataset_ag_news {textdata} | R Documentation |
AG's News Topic Classification Dataset
Description
The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600. Version 3, Updated 09/09/2015
Usage
dataset_ag_news(
dir = NULL,
split = c("train", "test"),
delete = FALSE,
return_path = FALSE,
clean = FALSE,
manual_download = FALSE
)
Arguments
dir |
Character, path to directory where data will be stored. If
|
split |
Character. Return training ("train") data or testing ("test") data. Defaults to "train". |
delete |
Logical, set |
return_path |
Logical, set |
clean |
Logical, set |
manual_download |
Logical, set |
Details
The classes in this dataset are
World
Sports
Business
Sci/Tech
Value
A tibble with 120,000 or 30,000 rows for "train" and "test" respectively and 3 variables:
- class
Character, denoting new class
- title
Character, title of article
- description
Character, description of article
Source
http://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.html
https://github.com/srhrshr/torchDatasets/raw/master/dbpedia_csv.tar.gz
See Also
Other topic:
dataset_dbpedia()
,
dataset_trec()
Examples
## Not run:
dataset_ag_news()
# Custom directory
dataset_ag_news(dir = "data/")
# Deleting dataset
dataset_ag_news(delete = TRUE)
# Returning filepath of data
dataset_ag_news(return_path = TRUE)
# Access both training and testing dataset
train <- dataset_ag_news(split = "train")
test <- dataset_ag_news(split = "test")
## End(Not run)