tidy_scrap {ralger} | R Documentation |
Website Tidy scraping
Description
This function is used to scrape a tibble from a website.
Usage
tidy_scrap(link, nodes, colnames, clean = FALSE, askRobot = FALSE)
Arguments
link |
the link of the web page to scrape |
nodes |
the vector of HTML or CSS elements to consider, the SelectorGadget tool is highly recommended. |
colnames |
the names of the expected columns. |
clean |
logical. Should the function clean the extracted tibble or not ? Default is FALSE. |
askRobot |
logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE. |
Value
a tidy data frame.
Examples
# Extracting imdb movie titles and rating
link <- "https://www.imdb.com/chart/top/"
my_nodes <- c(".titleColumn a", "strong")
names <- c("title", "rating")
tidy_scrap(link, my_nodes, names)
[Package ralger version 2.2.4 Index]