query_category_members {wikkitidy} | R Documentation |
Explore Wikipedia's category system
Description
These functions provide access to the CategoryMembers endpoint of the Action API.
query_category_members()
builds a generator query to return the members of a given category.
build_category_tree()
finds all the pages and subcategories beneath the
passed category, then recursively finds all the pages and subcategories
beneath them, until it can find no more subcategories.
Usage
query_category_members(
.req,
category,
namespace = NULL,
type = c("file", "page", "subcat"),
limit = 10,
sort = c("sortkey", "timestamp"),
dir = c("ascending", "descending", "newer", "older"),
start = NULL,
end = NULL,
language = "en"
)
build_category_tree(category, language = "en")
Arguments
.req |
|
category |
The category to start from. |
namespace |
Only return category members from the provided namespace |
type |
Alternative to |
limit |
The number to return each batch. Max 500. |
sort |
How to sort the returned category members. 'timestamp' sorts them by the date they were included in the category; 'sortkey' by the category member's unique hexadecimal code |
dir |
The direction in which to sort them |
start |
If |
end |
If |
language |
The language edition of Wikipedia to query |
Value
query_category_members()
: A request object of type
generator/query/action_api/httr2_request
, which can be passed to
next_batch()
or retrieve_all()
. You can specify which properties to
retrieve for each page using query_page_properties()
.
build_category_tree()
: A list containing two dataframes. nodes
lists
all the subcategories and pages found underneath the passed categories.
edges
records the connections between them. The source
column gives the
pageid of the parent category, while the target
column gives the pageid
of any categories, pages or files contained within the source
category.
The timestamp
records the moment when the target
page or subcategory
was included in the source
category. The two dataframes in the list can
be passed to igraph::graph_from_data_frame for network analysis.
Examples
# Get the first 10 pages in 'Category:Physics' on English Wikipedia
physics_members <- wiki_action_request() %>%
query_category_members("Physics") %>% next_batch()
physics_members
# Build the tree of all albums for the Melbourne band Custard
tree <- build_category_tree("Category:Custard_(band)_albums")
tree
# For network analysis and visualisation, you can pass the category tree
# to igraph
tree_graph <- igraph::graph_from_data_frame(tree$edges, vertices = tree$nodes)
tree_graph