R: Pull data from the Web of Science

pull_wos {wosr}

R Documentation

Pull data from the Web of Science

Description

pull_wos wraps the process of querying, downloading, parsing, and processing Web of Science data.

Usage

pull_wos(query, editions = c("SCI", "SSCI", "AHCI", "ISTP", "ISSHP",
  "BSCI", "BHCI", "IC", "CCR", "ESCI"),
  sid = auth(Sys.getenv("WOS_USERNAME"), Sys.getenv("WOS_PASSWORD")),
  ...)

Arguments

`query`	Query string. See the WoS query documentation page for details on how to write a query as well as this list of example queries.
`editions`	Web of Science editions to query. Possible values are listed here.
`sid`	Session identifier (SID). The default setting is to get a fresh SID each time you query WoS via a call to `auth`. However, you should try to reuse SIDs across queries so that you don't run into the throttling limits placed on new sessions.
`...`	Arguments passed along to `POST`.

Value

A list of the following data frames:

publication: A data frame where each row corresponds to a different publication. Note that each publication has a distinct ut. There is a one-to-one relationship between a ut and each of the columns in this table.
author: A data frame where each row corresponds to a different publication/author pair (i.e., a ut/author_no pair). In other words, each row corresponds to a different author on a publication. You can link the authors in this table to the address and author_address tables to get their addresses (if they exist). See example in FAQs for details.
address: A data frame where each row corresponds to a different publication/address pair (i.e., a ut/addr_no pair). In other words, each row corresponds to a different address on a publication. You can link the addresses in this table to the author and author_address tables to see which authors correspond to which addresses. See example in FAQs for details.
author_address: A data frame that specifies which authors correspond to which addresses on a given publication. This data frame is meant to be used to link the author and address tables together.
jsc: A data frame where each row corresponds to a different publication/jsc (journal subject category) pair. There is a many-to-many relationship between ut's and jsc's.
keyword: A data frame where each row corresponds to a different publication/keyword pair. These are the author-assigned keywords.
keywords_plus: A data frame where each row corresponds to a different publication/keywords_plus pair. These keywords are the keywords assigned by Clarivate Analytics through an automated process.
grant: A data frame where each row corresponds to a different publication/grant agency/grant ID triplet. Not all publications acknowledge a specific grant number in the funding acknowledgement section, hence the grant_id field can be NA.
doc_type: A data frame where each row corresponds to a different publication/document type pair.

Examples

## Not run: 

sid <- auth("your_username", password = "your_password")
pull_wos("TS = (dog welfare) AND PY = 2010", sid = sid)

# Re-use session ID. This is best practice to avoid throttling limits:
pull_wos("TI = \"dog welfare\"", sid = sid)

# Get fresh session ID:
pull_wos("TI = \"pet welfare\"", sid = auth("your_username", "your_password"))

# It's best to see how many records your query matches before actually
# downloading the data. To do this, call query_wos before running pull_wos:
query <- "TS = ((cadmium AND gill*) NOT Pisces)"
query_wos(query, sid = sid) # shows that there are 1,611 matching publications
pull_wos(query, sid = sid)

## End(Not run)

[Package wosr version 0.3.0 Index]