gutenberg_strip {gutenbergr} | R Documentation |
Strip header and footer content from a Project Gutenberg book
Description
Strip header and footer content from a Project Gutenberg book. This is based on some formatting guesses so it may not be perfect. It will also not strip tables of contents, prologues, or other text that appears at the start of a book.
Usage
gutenberg_strip(text)
Arguments
text |
A character vector with lines of a book |
Value
A character vector with Project Gutenberg headers and footers removed
Examples
library(dplyr)
book <- gutenberg_works(title == "Pride and Prejudice") %>%
gutenberg_download(strip = FALSE)
head(book$text, 10)
tail(book$text, 10)
text_stripped <- gutenberg_strip(book$text)
head(text_stripped, 10)
tail(text_stripped, 10)
[Package gutenbergr version 0.2.4 Index]