R: Extract Medication Entities From Clinical Note

medExtractR {medExtractR}

R Documentation

Extract Medication Entities From Clinical Note

Description

This function identifies medication entities of interest and returns found expressions with start and stop positions.

Usage

medExtractR(
  note,
  drug_names,
  window_length,
  unit,
  max_dist = 0,
  drug_list = "rxnorm",
  lastdose = FALSE,
  lastdose_window_ext = 1.5,
  strength_sep = NULL,
  flag_window = 30,
  dosechange_dict = "default",
  ...
)

Arguments

`note`	Text to search.
`drug_names`	Vector of drug names of interest to locate.
`window_length`	Length (in number of characters) of window after drug in which to look.
`unit`	Strength unit to look for (e.g., ‘mg’).
`max_dist`	Numeric - edit distance to use when searching for `drug_names`.
`drug_list`	Vector of known drugs that may end search window. By default calls `rxnorm_druglist`.
`lastdose`	Logical - whether or not last dose time entity should be extracted.
`lastdose_window_ext`	Numeric - multiplicative factor by which `window_length` should be extended when identifying last dose time.
`strength_sep`	Delimiter for contiguous medication strengths (e.g., ‘-’ for “LTG 200-300”).
`flag_window`	How far around drug (in number of characters) to look for dose change keyword - default fixed to 30. See ‘Details’ section below for further explanation.
`dosechange_dict`	List of keywords used to determine if a dose change entity is present.
`...`	Parameter settings used in extracting frequency, intake time, route, and duration. Potentially useful parameters include `freq_dict`, `intaketime_dict`, `route_dict`, and `duration_dict` (see `...` argument in `extract_entities`) to specify frequency or intake time dictionaries, as well as ‘freq_fun’, ‘intaketime_fun’, ‘route_fun’, and ‘duration_fun’ for user-specified extraction functions. If no additional arguments are provided, `medExtractR_tapering` will use `extract_generic` and the default dictionary for each entity. See `extract_entities` documentation for details.

Details

This function uses a combination of regular expressions, rule-based approaches, and dictionaries to identify various drug entities of interest. Specific medications to be found are specified with drug_names, which is not case-sensitive or space-sensitive (e.g., ‘lamotrigine XR’ is treated the same as ‘lamotrigineXR’). Entities to be extracted include drug name, strength, dose amount, dose, frequency, intake time, route, duration, and time of last dose. See extract_entities and extract_lastdose for more details.

When searching for medication names of interest, fuzzy matching may be used. The max_dist argument determines the maximum edit distance allowed for such matches. If using fuzzy matching, any drug name with less than 5 characters will only allow an edit distance of 1, regardless of the value of max_dist.

The purpose of the drug_list argument is to reduce false positives by removing information that is likely to be related to a competing drug, not our drug of interest, By default, this is “rxnorm” which calls data(rxnorm_druglist). A custom drug list in the form of a character string can be supplied instead, or can be appended to rxnorm_druglist by specifying drug_list = c("rxnorm", custom_drug_list). medExtractR then uses this list to truncate the search window at the first appearance of an unrelated drug name. This uses publicly available data courtesy of the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the product and does not endorse or recommend this or any other product. See rxnorm_druglist documentation for details.

Most medication entities are searched for in a window after the drug. The dose change entity, or presence of a keyword to indicate a non-current drug regimen, may occur before the drug name. The flag_window argument adjusts the width of the pre-drug window. Both flag_window and dosechange_dict are not default arguments to the extended function medExtractR_tapering since that extension uses a more flexible search window and extraction procedure. In the tapering extension, entity extraction is more flexible, and any entity can be extracted either before or after the drug mention. Thus functionality for dose change identification is identical to all other dictionary-based entities.

The stength_sep argument is NULL by default, but can be used to identify shorthand for morning and evening doses. For example, consider the phrase ‘Lamotrigine 300-200’ (meaning 300 mg in the morning and 200 mg in the evening). The argument strength_sep = '-' identifies the full expression 300-200 as dose strength in this phrase.

Value

data.frame with entity information. Only extractions from found entities are returned. If no dosing information for the drug of interest is found, the following output will be returned:

entity	expr	pos
NA	NA	NA

The “entity” column of the output contains the formatted label for that entity, according to the following mapping.
drug name: “DrugName”
strength: “Strength”
dose amount: “DoseAmt”
dose strength: “DoseStrength”
frequency: “Frequency”
intake time: “IntakeTime”
duration: “Duration”
route: “Route”
dose change: “DoseChange”
time of last dose: “LastDose”
Sample output:

entity	expr	pos
DoseChange	decrease	66:74
DrugName	Prograf	78:85
Strength	2 mg	86:90
DoseAmt	1	91:92
Route	by mouth	100:108
Frequency	bid	109:112
LastDose	2100	129:133

References

Nelson SJ, Zeng K, Kilbourne J, Powell T, Moore R. Normalized names for clinical drugs: RxNorm at 6 years. J Am Med Inform Assoc. 2011 Jul-Aug;18(4)441-8. doi: 10.1136/amiajnl-2011-000116. Epub 2011 Apr 21. PubMed PMID: 21515544; PubMed Central PMCID: PMC3128404.

Examples


note1 <- "Progrf Oral Capsule 1 mg 3 capsules by mouth twice a day - last
dose at 10pm"
medExtractR(note1, c("prograf", "tacrolimus"), 60, "mg", 2, lastdose=TRUE)
note2 <- "Currently on lamotrigine 150-200, but will increase to lamotrigine 200mg bid"
medExtractR(note2, c("lamotrigine", "ltg"), 130, "mg", 1, strength_sep = "-")

[Package medExtractR version 0.4.1 Index]