w_Wikipedias {wikiTools}R Documentation

Get Wikipedia pages of Wikidata entities

Description

Get from Wikidata all Wikipedia page titles and URL of the Wikidata entities in entity_list. If parameter wikilangs=”, then returns all Wikipedia page titles, else only the languages in wikilangs. The returned dataframe also includes the Wikidata entity classes of which the searched entity is an instance. If set the parameter instanceof, then only returns the pages for Wikidata entities which are instances of the Wikidata class indicated in it. The data-frame doesn't return labels or descriptions about entities: the function w_LabelDesc can be used for this. Duplicated entities are deleted before search. Index of the data-frame returned are also set to entity_list.

Usage

w_Wikipedias(
  entity_list,
  wikilangs = "",
  instanceof = "",
  nlimit = 1500,
  debug = FALSE
)

Arguments

entity_list

A vector of Wikidata entities.

wikilangs

List of languages to limit the search, using "|" as separator. Wikipedias page titles are returned in same order as languages in this parameter. If wikilangs=” the function returns Wikipedia page titles in any language, not sorted.

instanceof

Wikidata entity class to limit the result to the instances of that class. For example, if instanceof='Q5', limit the results to "human".

nlimit

If the number of entities exceeds this number, chunked queries are done. This is the number of entities requested in each chunk. Please, reduce the default value if error is raised.

debug

For debugging purposes (default FALSE). If debug='info' information about chunked queries is shown. If debug='query' also the query launched is shown.

Value

A data-frame with five columns: entities, instanceof, npages, page titles and page URLs. Last three use "|" as separator. Index of data-frame is also set to the entity_list.

Author(s)

Angel Zazo, Department of Computer Science and Automatics, University of Salamanca

Examples

## Not run: 
# aux: get a vector of entities (l).
df <- w_SearchByLabel(string='Napoleon', langsorder='en', mode='inlabel')
l <- df$entity  # aprox. 3600

w <- w_Wikipedias(entity_list=l, debug='info')
w <- w_Wikipedias(entity_list=l, wikilangs='es|en|fr', debug='info')
# Filter instanceof=Q5 (human):
w_Q5 <- w[grepl("\\bQ5\\b", w$instanceof), ]
w_Q5b <- w_Wikipedias(entity_list=l, wikilangs='es|en|fr', instanceof='Q5', debug='info')

## End(Not run)

[Package wikiTools version 1.2.7 Index]