m_reqMediaWiki {wikiTools} | R Documentation |
Retrieve responses using the MediaWiki API.
Description
Use the MediaWiki API to check Wikipedia pages titles, get redirections of Wikipedia pages, get image URL of Wikipedia pages or get URL of files in Wikipedia pages
Usage
m_reqMediaWiki(
titles,
mode = c("wikidataEntity", "redirects", "pagePrimaryImage", "pageFiles"),
project = "en.wikipedia.org",
redirects = TRUE,
exclude_ext = "svg|webp|xcf"
)
Arguments
titles |
A vector of page titles to search for. |
mode |
Select an action to perform: 'wikidataEntity' -> Use reqMediaWiki to check if page titles are in a Wikimedia project and returns the Wikidata entity for them. Automatically resolves redirects if parameter redirects = TRUE (default). If a page title exists in the Wikimedia project, the status column in the returned data-frame is set to 'OK'. If a page is a disambiguation page, that column is set to 'disambiguation', and if a title is not in the Wikimedia project, it is set to 'missing' and no Wikidata entity is returned; 'redirects' -> Obtains redirection of pages of the article titles in the Wikimedia project restricted to namespace 0. Returns a vector for each title, in each vector the first element is the page destiny, the rest are all pages that redirect to it. If a title is not in the Wikimedia project its list is NA; 'pagePrimaryImage' -> Return the URL of the image associated with the Wikipedia pages of the titles, if pages has one. Automatically resolves redirects, the "normalized" column of the returned data-frames contains the destiny page of the redirection. See https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bpageimages; 'pageFiles' -> Search for URL of files inserted in Wikipedia pages. Exclude extensions in exclude_ext. Note that the query API named this search as 'images', but all source files in the page are returned. The function only return URL that not end with extensions in exclude_ext parameter (case insensitive). Automatically resolves redirects, the "normalized" column of the returned data-frame contains the destiny page of the redirection. See https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bimages |
project |
Wikimedia project, defaults "en.wikipedia.org" |
redirects |
If page redirects must be resolved. If redirects=TRUE (default) then the "normalized" column of the returned data-frames contains the destiny page title of the redirection. Only for mode=wikidataEntity. |
exclude_ext |
File extensions excluded in results. Only for mode=PageFiles. Default 'svg|webp|xcf' |
Value
depends on the mode selected: 'wikidataEntity' Null if there is any error in response, else a data-frame with four columns: first, the original page title string, second, the normalized one, third, logical error=FALSE, if Wikidata entity exists for the page, or error=TRUE it does not, last, the Wikidata entity itself or a clarification of the error; 'redirects' A vector for each title, with all pages that are redirects to the first element; 'pagePrimaryImage' A data-frame with original titles, normalized ones, the status of the pages and the primary image of the page or NA if it does not exist; 'pageFiles' A data-frame with original titles, the normalized ones, status for the page and the URL files of the Wikipedia pages, using use "|" to separate ones) or NA if files do not exits or are excluded.
Author(s)
Angel Zazo, Department of Computer Science and Automatics, University of Salamanca
Examples
# Note that URLdecode("a%CC%8C") is
# the letter "a" with the combining caron
df <- m_reqMediaWiki(c('Max Planck', URLdecode("a%CC%8C"), 'Max', 'Cervante', 'humanist'),
mode='wikidataEntity', project='en.wikipedia.org')
a <- m_reqMediaWiki(c('Cervantes', 'Planck', 'Noexiste'), mode='redirects',
project='es.wikipedia.org')
i <- m_reqMediaWiki(c('Max Planck', URLdecode("a%CC%8C"), 'Max', 'Cervante', 'humanist'),
mode='pagePrimaryImage')
f <- m_reqMediaWiki(c('Max Planck', URLdecode("a%CC%8C"), 'Max', 'Cervante', 'humanist'),
mode='pageFiles', exclude_ext = "svg|webp|xcf")