titles_scrap {ralger} | R Documentation |
Website title scraping
Description
This function is used to scrape titles (h1, h2 & h3 html tags) from a website. Useful for scraping daily electronic newspapers' titles.
Usage
titles_scrap(link, contain = NULL, case_sensitive = FALSE, askRobot = FALSE)
Arguments
link |
the link of the web page to scrape |
contain |
filter the titles according to a character string provided. |
case_sensitive |
logical. Should the contain argument be case sensitive ? defaults to FALSE |
askRobot |
logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE |
Value
a character vector
Examples
# Extracting the current titles of the New York Times
link <- "https://www.nytimes.com/"
titles_scrap(link)
[Package ralger version 2.2.4 Index]