R: Get data about pages from their titles

page_vector_functions {wikkitidy}

R Documentation

Get data about pages from their titles

Description

get_latest_revision() returns metadata about the latest revision of each page.

get_page_html() returns the rendered html for each page.

get_page_summary() returns metadata about the latest revision, along with the page description and a summary extracted from the opening paragraph

get_page_related() returns summaries for 20 related pages for each passed page

get_page_talk() returns structured talk page content for each title. You must ensure to use the title for the Talk page itself, e.g. "Talk:Earth" rather than "Earth"

get_page_langlinks() returns interwiki links for each title

Usage

get_latest_revision(title, language = "en")

get_page_html(title, language = "en")

get_page_summary(title, language = "en")

get_page_related(title, language = "en")

get_page_talk(title, language = "en")

get_page_langlinks(title, language = "en")

Arguments

`title`	A character vector of page titles.
`language`	A character vector of two-letter language codes, either of length 1 or the same length as `title`

Value

A list, vector or tibble, the same length as title, with the desired data.

Examples

# Get language links for a known page on English Wikipedia
get_page_langlinks("Charles Harpur")

# Many of these functions return a list of data frames. Tidyr can be useful.
# Get 20 related pages for German City
cities <- tibble::tribble(
  ~city,
  "Berlin",
  "Darmstadt",
) %>%
  dplyr::mutate(related = get_page_related(city))
cities

# Unest to get one row per related page:
tidyr::unnest(cities, "related")

# The functions are vectorised over title and language
# Find all articles about Joanna Baillie, and retrieve summary data for
# the first two.
baillie <- get_page_langlinks("Joanna Baillie") %>%
  dplyr::slice(1:2) %>%
  dplyr::mutate(get_page_summary(title = title, language = code))
baillie

[Package wikkitidy version 0.1.12 Index]