page_vector_functions {wikkitidy} | R Documentation |
Get data about pages from their titles
Description
get_latest_revision()
returns metadata about the latest
revision of each
page.
get_page_html()
returns the rendered html for each
page.
get_page_summary()
returns metadata about the latest revision, along
with the page description and a summary extracted from the opening
paragraph
get_page_related()
returns summaries for 20 related pages for each
passed page
get_page_talk()
returns structured talk page content for each
title. You must ensure to use the title for the Talk page itself, e.g.
"Talk:Earth" rather than "Earth"
get_page_langlinks()
returns interwiki links for each
title
Usage
get_latest_revision(title, language = "en")
get_page_html(title, language = "en")
get_page_summary(title, language = "en")
get_page_related(title, language = "en")
get_page_talk(title, language = "en")
get_page_langlinks(title, language = "en")
Arguments
title |
A character vector of page titles. |
language |
A character vector of two-letter language codes, either of
length 1 or the same length as |
Value
A list, vector or tibble, the same length as title
, with the
desired data.
Examples
# Get language links for a known page on English Wikipedia
get_page_langlinks("Charles Harpur")
# Many of these functions return a list of data frames. Tidyr can be useful.
# Get 20 related pages for German City
cities <- tibble::tribble(
~city,
"Berlin",
"Darmstadt",
) %>%
dplyr::mutate(related = get_page_related(city))
cities
# Unest to get one row per related page:
tidyr::unnest(cities, "related")
# The functions are vectorised over title and language
# Find all articles about Joanna Baillie, and retrieve summary data for
# the first two.
baillie <- get_page_langlinks("Joanna Baillie") %>%
dplyr::slice(1:2) %>%
dplyr::mutate(get_page_summary(title = title, language = code))
baillie