R: Website Tidy scraping

tidy_scrap {ralger}

R Documentation

Website Tidy scraping

Description

This function is used to scrape a tibble from a website.

Usage

tidy_scrap(link, nodes, colnames, clean = FALSE, askRobot = FALSE)

Arguments

`link`	the link of the web page to scrape
`nodes`	the vector of HTML or CSS elements to consider, the SelectorGadget tool is highly recommended.
`colnames`	the names of the expected columns.
`clean`	logical. Should the function clean the extracted tibble or not ? Default is FALSE.
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.

Value

a tidy data frame.

Examples


# Extracting imdb movie titles and rating

link     <- "https://www.imdb.com/chart/top/"
my_nodes <- c(".titleColumn a", "strong")
names    <- c("title", "rating")

tidy_scrap(link, my_nodes, names)

[Package ralger version 2.2.4 Index]