R: Website title scraping

titles_scrap {ralger}

R Documentation

Website title scraping

This function is used to scrape titles (h1, h2 & h3 html tags) from a website. Useful for scraping daily electronic newspapers' titles.

titles_scrap(link, contain = NULL, case_sensitive = FALSE, askRobot = FALSE)

`link`	the link of the web page to scrape
`contain`	filter the titles according to a character string provided.
`case_sensitive`	logical. Should the contain argument be case sensitive ? defaults to FALSE
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE

a character vector


# Extracting the current titles of the New York Times

link     <- "https://www.nytimes.com/"

titles_scrap(link)

[Package ralger version 2.2.4 Index]