retrieve_links {archiveRetriever} | R Documentation |
retrieve_links: Retrieving Links of Lower-level web pages of mementos from the Internet Archive
Description
retrieve_links
retrieves the Urls of mementos stored in the Internet Archive
Usage
retrieve_links(
ArchiveUrls,
encoding = "UTF-8",
ignoreErrors = FALSE,
filter = TRUE,
pattern = NULL,
nonArchive = FALSE
)
Arguments
ArchiveUrls |
A string of the memento of the Internet Archive |
encoding |
Specify a encoding for the homepage. Default is 'UTF-8' |
ignoreErrors |
Ignore errors for some Urls and proceed scraping |
filter |
Filter links by top-level domain. Only sub-domains of top-level domain will be returned. Default is TRUE. |
pattern |
Filter links by custom pattern instead of top-level domains. Default is NULL. |
nonArchive |
Logical input. Can be set to TRUE if you want to use the archiveRetriever to scrape web pages outside the Internet Archive. |
Value
This function retrieves the links of all lower-level web pages of mementos of a homepage available from the Internet Archive. It returns a tibble including the baseUrl and all links of lower-level web pages. However, a memento being stored in the Internet Archive does not guarantee that the information from the homepage can be actually scraped. As the Internet Archive is an internet resource, it is always possible that a request fails due to connectivity problems. One easy and obvious solution is to re-try the function.
Examples
## Not run:
retrieve_links("http://web.archive.org/web/20190801001228/https://www.spiegel.de/")
## End(Not run)