oa_ngrams {openalexR} | R Documentation |
Get N-grams of works
Description
Some work entities in OpenAlex include N-grams (word sequences and their frequencies) of their full text. The N-grams are obtained from Internet Archive, which uses the spaCy parser to index scholarly works. See <https://docs.openalex.org/api-entities/works/get-n-grams> for coverage and more technical details.
Usage
oa_ngrams(
works_identifier,
...,
endpoint = "https://api.openalex.org",
verbose = FALSE
)
Arguments
works_identifier |
Character. OpenAlex ID(s) of "works" entities as item identifier(s). These IDs start with "W". See more at <https://docs.openalex.org/api-entities/works#id>. |
... |
Unused. |
endpoint |
Character. URL of the OpenAlex Endpoint API server. Defaults to endpoint = "https://api.openalex.org". |
verbose |
Logical. If TRUE, print information on querying process.
Default to |
Value
A dataframe of paper metadatada and a list-column of ngrams.
Note
A faster implementation is available for 'curl' >= v5.0.0, and 'oa_ngrams' will issue a one-time message about this. This can be suppressed with 'options("oa_ngrams.message.curlv5" = FALSE)'.
Examples
## Not run:
ngrams_data <- oa_ngrams(c("W1963991285", "W1964141474"))
# 10 most common ngrams in the first work
first_paper_ngrams <- ngrams_data$ngrams[[1]]
first_paper_ngrams[
order(first_paper_ngrams$ngram_count, decreasing = TRUE),
][
1:10,
]
# Missing N-grams are `NULL` in the `ngrams` list-column
oa_ngrams("https://openalex.org/W2284876136")
## End(Not run)