dataset_dbpedia {textdata} | R Documentation |
DBpedia Ontology Dataset
Description
DBpedia ontology dataset classification dataset. It contains 560,000 training samples and 70,000 testing samples for each of 14 nonoverlapping classes from DBpedia.
Usage
dataset_dbpedia(
dir = NULL,
split = c("train", "test"),
delete = FALSE,
return_path = FALSE,
clean = FALSE,
manual_download = FALSE
)
Arguments
dir |
Character, path to directory where data will be stored. If
|
split |
Character. Return training ("train") data or testing ("test") data. Defaults to "train". |
delete |
Logical, set |
return_path |
Logical, set |
clean |
Logical, set |
manual_download |
Logical, set |
Details
The classes are
Company
EducationalInstitution
Artist
Athlete
OfficeHolder
MeanOfTransportation
Building
NaturalPlace
Village
Animal
Plant
Album
Film
WrittenWork
Value
A tibble with 560,000 or 70,000 rows for "train" and "test" respectively and 3 variables:
- class
Character, denoting the class class
- title
Character, title of article
- description
Character, description of article
Source
https://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf
https://github.com/srhrshr/torchDatasets/raw/master/dbpedia_csv.tar.gz
See Also
Other topic:
dataset_ag_news()
,
dataset_trec()
Examples
## Not run:
dataset_dbpedia()
# Custom directory
dataset_dbpedia(dir = "data/")
# Deleting dataset
dataset_dbpedia(delete = TRUE)
# Returning filepath of data
dataset_dbpedia(return_path = TRUE)
# Access both training and testing dataset
train <- dataset_dbpedia(split = "train")
test <- dataset_dbpedia(split = "test")
## End(Not run)