html_getLinks {MazamaCoreUtils} | R Documentation |
Find all links in an html page
Description
Parses an html page to extract all <a href="...">...</a>
links and return them in a dataframe where linkName
is the human
readable name and linkUrl
is the href
portion. By default this
function will return relative URLs.
This is especially useful for extracting data from an index page that shows the contents of a web accessible directory.
Wrapper functions html_getLinkNames()
and html_getLinkUrls()
return the appropriate columns as vectors.
Usage
html_getLinks(url = NULL, relative = TRUE)
html_getLinkNames(url = NULL)
html_getLinkUrls(url = NULL, relative = TRUE)
Arguments
url |
URL or file path of an html page. |
relative |
Logical instruction to return relative URLs. |
Value
A dataframe with linkName
and/or linkUrl
columns.
Examples
library(MazamaCoreUtils)
# Fail gracefully if the resource is not available
try({
# US Census 2019 shapefiles
url <- "https://www2.census.gov/geo/tiger/GENZ2019/shp/"
# Extract links
dataLinks <- html_getLinks(url)
dataLinks <- dataLinks %>%
dplyr::filter(stringr::str_detect(linkName, "us_county"))
head(dataLinks, 10)
}, silent = FALSE)
[Package MazamaCoreUtils version 0.5.2 Index]