make_req {repoRter.nih} | R Documentation |
make_req
Description
Easily generate a json request with correct schema to be passed to NIH RePORTER Project API
Usage
make_req(
criteria = list(fiscal_years = lubridate::year(Sys.Date())),
include_fields = NULL,
exclude_fields = NULL,
offset = 0,
limit = 500,
sort_field = NULL,
sort_order = NULL,
message = TRUE
)
Arguments
criteria |
list(); the RePORTER Project API query criteria used to filter results (projects). See Details for schema and other spec rules. |
include_fields |
character(); optional; use to return only the specified fields from the result. See Details for valid return field names |
exclude_fields |
character(); optional; use to exclude specified fields from the result. |
offset |
integer(1); optional; default: 0; usually not explicitly passed by user. Used to set the start index of the results to be retrieved (indexed from 0). See Details. |
limit |
integer(1); optional; default: 500; restrict the number of project records returned per page/request inside the calling function. Defaulted to the maximum allowed value of 500. Reducing this may help with bandwidth/timeout issues. |
sort_field |
character(1); optional; use to sort the result by the specified field. May be useful in retrieving complete result sets above the API maximum of 10K (but below 2x the max = 20K) |
sort_order |
character(1): optional; one of "asc" or "desc"; |
message |
logical(1); default: TRUE; print a message with the JSON to console/stdout. You may want to suppress this at times. |
Details
The maximum number of records that can be returned from any result set is 10,000. Also, the maximum record index
in the result set that can be returned is 9,999 - corresponding to the 10,000'th record in the set. These
constraints from the NIH API defy any intuition that the offset
argument would be useful to return records
beyond this 10K limit. If you need to do this, you have two options:
You can break your request into several smaller requests to be retrieved individually. For example, requesting records for one fiscal year (see:
fiscal_years
) at a time. This should be your first pathIf you have a result set between 10,001 and 20,000 records, you might try passing essentially the same request twice, but varying them by the sort order on some field (and taking care to avoid or remove overlapping results). See the
sort_field
andsort_order
arguments.
criteria
must be specified as a list and may include any of the following (all optional) top level elements:
use_relevance
: logical(1); if TRUE (default), it will sort the most closely matching records per the search criteria to the top (i.e. the NHI sorts descending according to a calculated match score)fiscal_years
: numeric(); one or more fiscal years to retrieve projects that correspond to (or started in) one of the fiscal years enteredinclude_active_projects
: logical(1); if TRUE (default), adds in active projects without regard forpolicy_years
pi_names
: list(); API will return records with Project Investigators (PIs) wildcard-matching any of the strings requested.
If provided, the list must contain three named character vector elements:first_name
,last_name
,any_name
. Each vector must contain at least one element - use a length-1 vector with an empty string (= ""
or= character(1)
) for any name field you do not wish to search on.multi_pi_only: logical(1)
; default: FALSE; when multiplepi_names
are matched, setting this value to TRUE changes the logic from returning project records associated with ANY matched PI name to those associated with ALL names.po_names
: list(); Project Officers (POs), otherwise same comments as forpi_names
org_names
: character(); one or more strings to filter organization names. The provided string is implicitly taken to include wildcards at head and tail ends; "JOHN" and "HOP" will both match "JOHNS HOPKINS UNIVERSITY", etc.org_names_exact_match
: character(); one or more strings to exactly match organization namespi_profile_ids
: numeric(); one or more project investigator profile IDs; results will match projects associated with any of the IDsorg_cities
: character(); one or more cities in which associated organizations may be based.org_states
: character(); one or more US States or Territories (note: requires the abbreviation codes: "NY", "PR", etc.) in which a project organization may be based.project_nums
: character(); one or more project numbers (note: the alphanumeric variety of numbers); results will match any of the specified strings. You may include explicit wildcard operators ("*") in the strings, e.g. "5UG1HD078437-\*"project_num_split
: list(6); theproject_nums
can be broken down to meaningful components which can be searched individually using this argument. These component codes are defined here Your list must contain all of the following named elements:appl_type_code
: character();activity_code
: character();ic_code
: character();serial_num
: character();support_year
: character();suffix_code
: character();
Provide a length-1 vector containing an empty string (
=""
or=character(1)
) for any element you do not want to search onspending_categories
: list(2); a list containing the following named elements:values
: numeric(): the NIH spending category code. These are congressionally defined and are available herematch_all
: logical(1); TRUE to return projects found in all categories; FALSE to return projects matching any one of the categories.
funding_mechanism
: character(); one or more NIH funding mechanism codes used in the president's budget. Available hereorg_countries
: character(); one or more country names; e.g. "United States"appl_ids
: numeric(); one or more application IDs (note: appl. IDs are natural numbers, unlikeproject_nums
)agencies
: character(); one or more of the abbreviated NIH agency/institute/center names, available hereis_agency_admin: logical(1)
; when specifying associatedagencies
, set this value toTRUE
to further specify that these agencies are administering the grant/project.is_agency_funding: logical(1)
; when specifying associatedagencies
, set this value toTRUE
to further specify that these agencies are funding the grant/project.activity_codes: character()
; a 3-character code identifying the grant, contract, or intramural activity through which a project is supported. This is a more detailed description within each funding mechanism. Codes are available hereaward_types: character()
; (aka Type of Application) one or more grant/application type codes numbered 1-9. See types heredept_types: character()
; one or more of NIH standardized department type names (e.g. "PEDIATRICS"). Valid names are provided herecong_dists: character()
; one or more US congressional districts (e.g. "NY-20") which the project can be associated with. See herefoa: character()
; one or more FOA (Funding Opportunity Announcements). Multiple projects may be tied to a single FOA. See hereproject_start_date
: list(2); provide a range for the project start date. Must pass as list containing the following named elements:from_date
: character(1); string date in %Y-%m-%d format. See?base::format
for converting from date class.to_date
: character(1); string date in %Y-%m-%d format.
project_end_date: list(2)
; provide a range for the project end date - similar toproject_start_date
.from_date
: character(1); string date in %Y-%m-%d format. See?base::format
for converting from date class.to_date
: character(1); string date in %Y-%m-%d format.
organization_type: character()
; one or more types of applicant organizations (e.g. "SCHOOLS OF MEDICINE"). There does not appear to be a documented list of valid values, but you can obtain one by pulling all records in a recent year and extracting unique values.award_notice_date: list(2)
; the award notice date as a range, or you can provide just one of the min/max date, but if you do you must provide the other as an empty string.from_date
: character(1); string date in %Y-%m-%d format. See?base::format
for converting from date class.to_date
: character(1); string date in %Y-%m-%d format.
award_amount_range: list(2)
; a numeric range - if you don't want to filter by this sub-criteria (but are filtering on some other award criteria), enter 0 for min and 1e9 for maxmin_amount: numeric(1)
; a real number between 0 and something very largemax_amount: numeric(1)
; a real number between 0 and something very large
exclude_subprojects: logical(1)
; default: FALSE; related to multiproject research awards, TRUE will limit results to just the parent project.sub_project_only: logical(1)
; default: FALSE; similar toexclude_subprojects
, this field will limit results to just the subprojects, excluding the parent.newly_added_projects_only: logical(1)
; default: FALSE; return only those projects "newly added" (this is left undefined in the official documentation) to the system.covid_response: character();
one or more special selector codes used to return projects awarded to study COVID-19 and related topics as funded and classified according to the below valid values/funding sources:All
: all COVID-19 projectsReg-CV
: those funded by regular NIH Appropriated fundsCV
: those funded by the Coronavirus Preparedness and Response Supplemental Appropriations Act, 2020C3
: those funded by the CARES ActC4
: those funded by the Paycheck Protection Program and Health Care Enhancement ActC5
: those funded by the Coronavirus Response and Relief Supplemental Appropriations Act, 2021C6
: those funded by the American Rescue Plan Act, 2021
full_study_sections: list(6)
; (not documented in API notes) Review activities of the Center for Scientific Review (CSR) are organized into Integrated Review Groups (IRGs). Each IRG represents a cluster of study sections around a general scientific area. Applications generally are assigned first to an IRG, and then to a specific study section within that IRG for evaluation of scientific merit.
This gets a bit complicated so we provide this resource for further reading. If providing this criteria, you must include each of the below named elements as character vectors:irg_code
: character(); Integrated Review Groupsra_designator_code
: character(); Scientific Review Administratorsra_flex_code
: character();group_code
: character();name
: character();url
: character();
advanced_text_search
: list(3); used to perform string search in the Project Title ("projecttitle"), Abstract ("abstract"), and/or Project Terms ("terms") fields. If providing this criteria, you must include each of the below named elements:operator: character(1)
; one of "and", "or", "advanced". "and", "or" will be the logical operator between all provided search terms. "advanced" allows the user to pass a boolean search string directly.search_field: character()
; can be one or multiple of "abstract", "terms", "projecttitle" passed as a vector of length 1 to 3. To search all fields, the user can alternatively pass a length 1 character vector containing the string "all" or "".search_text: character(1)
; pass one or multiple search terms separated by spaces, without any quotations. If searching in "advanced" mode, provide a boolean search string - you may use parentheses, AND, OR, NOT, and *escaped* double quotes (e.g.search_text = "(brain AND damage) OR (\"insane in the membrane\") AND cure"
)
Field Names
Full listing of available field names which can be specified in include_fields
, exclude_fields
, and sort_field
is located here
Value
A standard json (jsonlite
flavor) object containing the valid JSON request string which can
be passed to get_nih_data
or elsewhere
Examples
library(repoRter.nih)
## all projects funded in the current (fiscal) year
req <- make_req()
## projects funded in 2019 through 2021
req <- make_req(criteria = list(fiscal_years = 2019:2021))
## projects funded in 2021 where the principal investigator first name is
## "Michael" or begins with "Jo"
req <- make_req(criteria =
list(fiscal_years = 2021,
pi_names =
list(first_name = c("Michael", "Jo*"),
last_name = c(""), # must specify
any_name = character(1) # same here
)
)
)
## all covid-related projects except those funded by American Rescue Plan
## and specify the fields to return, sorting ascending on ApplId column
req <- make_req(criteria =
list(covid_response = c("Reg-CV", "CV", "C3", "C4", "C5")
),
include_fields =
c("ApplId", "SubprojectId", "FiscalYear", "Organization",
"AwardAmount", "CongDist", "CovidResponse",
"ProjectDetailUrl"),
sort_field = "ApplId",
sort_order = "asc")
## using advanced_text_search with boolean search string
string <- "(head AND trauma) OR \"brain damage\" AND NOT \"psychological\""
req <- make_req(criteria =
list(advanced_text_search =
list(operator = "advanced",
search_field = c("terms", "abstract"),
search_text = string
)
)
)