R: Website text paragraph scraping

paragraphs_scrap {ralger}

R Documentation

Website text paragraph scraping

Description

This function is used to scrape text paragraphs from a website.

Usage

paragraphs_scrap(
  link,
  contain = NULL,
  case_sensitive = FALSE,
  collapse = FALSE,
  askRobot = FALSE
)

Arguments

`link`	the link of the web page to scrape
`contain`	filter the paragraphs according to the character string provided.
`case_sensitive`	logical. Should the contain argument be case sensitive ? defaults to FALSE
`collapse`	if TRUE the paragraphs will be collapsed into one element and the contain argument ignored.
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrap the web page ? Default is FALSE.

Value

a character vector.

Examples


# Extracting the paragraphs displayed on the health page of the New York Times

link     <- "https://www.nytimes.com/section/health"

paragraphs_scrap(link)

[Package ralger version 2.2.4 Index]