R: Download one or more works using a Project Gutenberg ID

gutenberg_download {gutenbergr}

R Documentation

Download one or more works using a Project Gutenberg ID

Description

Download one or more works by their Project Gutenberg IDs into a data frame with one row per line per work. This can be used to download a single work of interest or multiple at a time. You can look up the Gutenberg IDs of a work using the gutenberg_works() function or the gutenberg_metadata dataset.

Usage

gutenberg_download(
  gutenberg_id,
  mirror = NULL,
  strip = TRUE,
  meta_fields = NULL,
  verbose = TRUE,
  files = NULL,
  ...
)

Arguments

`gutenberg_id`	A vector of Project Gutenberg ID, or a data frame containing a `gutenberg_id` column, such as from the results of a `gutenberg_works()` call
`mirror`	Optionally a mirror URL to retrieve the books from. By default uses the mirror from `gutenberg_get_mirror`
`strip`	Whether to strip suspected headers and footers using the `gutenberg_strip` function
`meta_fields`	Additional fields, such as `title` and `author`, to add from gutenberg_metadata describing each book. This is useful when returning multiple
`verbose`	Whether to show messages about the Project Gutenberg mirror that was chosen
`files`	A vector of .zip file paths. If given, this reads from the files rather than from the site. This is mostly used for testing when the Project Gutenberg website may not be available.
`...`	Extra arguments passed to `gutenberg_strip`, currently unused

Details

Note that if strip = TRUE, this tries to remove the Gutenberg header and footer using the gutenberg_strip function. This is not an exact process since headers and footers differ between books. Before doing an in-depth analysis you may want to check the start and end of each downloaded book.

Value

A two column tbl_df (a type of data frame; see tibble or dplyr packages) with one row for each line of the text or texts, with columns

gutenberg_id: Integer column with the Project Gutenberg ID of each text
text: A character vector

Examples



library(dplyr)

# download The Count of Monte Cristo
gutenberg_download(1184)

# download two books: Wuthering Heights and Jane Eyre
books <- gutenberg_download(c(768, 1260), meta_fields = "title")
books
books %>% count(title)

# download all books from Jane Austen
austen <- gutenberg_works(author == "Austen, Jane") %>%
  gutenberg_download(meta_fields = "title")

austen
austen %>%
  count(title)

[Package gutenbergr version 0.2.4 Index]