w_SearchByLabel {wikiTools}R Documentation

Search Wikidata entities by string (usually labels)

Description

Search Wikidata entities in label and altLabel ("Also known as") or in any part of the entity using different approaches.

Usage

w_SearchByLabel(
  string,
  mode = "inlabel",
  langs = "",
  langsorder = "",
  instanceof = "",
  Pproperty = "",
  debug = FALSE
)

Arguments

string

String (label or altLabel) to search. Note that single quotation mark must be escaped (string="O\'Donell"), otherwise an error will be raised.

mode

The mode to perform search. Default 'inlabel' mode.

  • 'exact' for an exact search in label or altLabel using case sensitive search and differentiate diacritics. Languages in the parameter lang are used, so this parameter is mandatory using this mode.

  • 'startswith' for entities which label or altLabel starts with the string, similar to a wildcard search "string*". The string is searched in label in the languages of lang parameter, but in any language in altLabel, so parameter lang is also mandatory in this mode. Diacritics and case are ignored in this mode.

  • 'cirrus' search words in any order in any part of the entity (which must be a string), not only in label or altLabel. Diacritics and case are ignored. It is a full text search using the ElasticSearch engine. Phrase search can be used if launched with double quotation marks, for example, string='"Antonio Saura"'. Also fuzzy search is possible, for example, string="algermon~1" or string="algernon~2". Also REGEX search can be used (but it is a very limited functionality) using this format: string="insource:/regex/i" (i: is for ignore case, optional). In this mode, parameter langs is ignored.

  • 'inlabel' is an special case of 'cirrus' search for matching whole words (in any order) in any position in label or altLabel. With this mode no fuzzy search can be used, but some languages can be set in the lang parameter. Modes 'inlabel' and 'cirrus' use the CirrusSearch of the Wikidata API. Please, for more examples, see https://www.mediawiki.org/wiki/Help:CirrusSearch and https://www.mediawiki.org/wiki/Help:Extension:WikibaseCirrusSearch

langs

Languages in which the information will be searched, using "|" as separator. In 'exact' or 'startswith' modes this parameter is mandatory, at least one language is required. In 'inlabel'mode, if the parameter langs is set, then the search is restricted to languages in this parameter, otherwise any language. In 'cirrus' mode this parameter is ignored.

langsorder

Order of languages in which the information will be returned, using "|" as separator. If langsorder=”, no labels or descriptions will be returned, otherwise, they are returned in the order of languages in this parameter, if any.

instanceof

Wikidata entity of which the entities searched for are an example or member of it (class). For example, if instanceof='Q5' the search are filtered to Wikidata entities of class Q5 (human). Some entity classes are allowed, separated with '|'.

Pproperty

Wikidata properties to search, separated with '|', mandatory. For example, is Pproperty="P21", the results contain information of the sex of entities. If Pproperty="P21|P569" also searches for birthdate. If Pproperty='P21|P569|P214' also searches for VIAF identifier.

debug

For debugging purposes (default FALSE). If debug='query' the query launched is shown. If debug='count' the function only returns the number of entities with that occupation.

Value

A data-frame with 'entity', 'entityLabel', 'entityDescription', (including 'instance', 'instanceLabel', 'altLabel' if mode="startswith") and additionally the properties of Pproperty.

Author(s)

Angel Zazo, Department of Computer Science and Automatics, University of Salamanca

Examples

## Not run: 
df <- w_SearchByLabel(string='Iranzo', mode="exact", langs='es|en')
df <- w_SearchByLabel(string='Iranzo', mode="exact", langs='es|en',
                      langsorder='es|en', instanceof = 'Q5|Q101352')
## Search entities which label or altLabel starts with "string"
df <- w_SearchByLabel(string='Iranzo', mode='startswith', lang='en', langsorder='es|en')
## Search in any position in Label or AltLabel (diacritics and case are ignored)
df <- w_SearchByLabel(string='Iranzo', mode='inlabel', langsorder='es|en')
## Search in Chinese (Simplified) (language code: zh) in any part of entity:
df <- w_SearchByLabel(string='\u4F0A\u5170\u4f50', mode='cirrus', langsorder='es|zh|en')

## End(Not run)

[Package wikiTools version 1.2.7 Index]