R: AG's News Topic Classification Dataset

dataset_ag_news {textdata}

R Documentation

AG's News Topic Classification Dataset

Description

The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600. Version 3, Updated 09/09/2015

Usage

dataset_ag_news(
  dir = NULL,
  split = c("train", "test"),
  delete = FALSE,
  return_path = FALSE,
  clean = FALSE,
  manual_download = FALSE
)

Arguments

`dir`	Character, path to directory where data will be stored. If `NULL`, user_cache_dir will be used to determine path.
`split`	Character. Return training ("train") data or testing ("test") data. Defaults to "train".
`delete`	Logical, set `TRUE` to delete dataset.
`return_path`	Logical, set `TRUE` to return the path of the dataset.
`clean`	Logical, set `TRUE` to remove intermediate files. This can greatly reduce the size. Defaults to FALSE.
`manual_download`	Logical, set `TRUE` if you have manually downloaded the file and placed it in the folder designated by running this function with `return_path = TRUE`.

Details

The classes in this dataset are

World
Sports
Business
Sci/Tech

Value

A tibble with 120,000 or 30,000 rows for "train" and "test" respectively and 3 variables:

class: Character, denoting new class
title: Character, title of article
description: Character, description of article

Source

http://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.html

https://github.com/srhrshr/torchDatasets/raw/master/dbpedia_csv.tar.gz

Examples

## Not run: 
dataset_ag_news()

# Custom directory
dataset_ag_news(dir = "data/")

# Deleting dataset
dataset_ag_news(delete = TRUE)

# Returning filepath of data
dataset_ag_news(return_path = TRUE)

# Access both training and testing dataset
train <- dataset_ag_news(split = "train")
test <- dataset_ag_news(split = "test")

## End(Not run)

[Package textdata version 0.4.5 Index]