medExtractR {medExtractR} | R Documentation |
Extract Medication Entities From Clinical Note
Description
This function identifies medication entities of interest and returns found expressions with start and stop positions.
Usage
medExtractR(
note,
drug_names,
window_length,
unit,
max_dist = 0,
drug_list = "rxnorm",
lastdose = FALSE,
lastdose_window_ext = 1.5,
strength_sep = NULL,
flag_window = 30,
dosechange_dict = "default",
...
)
Arguments
note |
Text to search. |
drug_names |
Vector of drug names of interest to locate. |
window_length |
Length (in number of characters) of window after drug in which to look. |
unit |
Strength unit to look for (e.g., ‘mg’). |
max_dist |
Numeric - edit distance to use when searching for |
drug_list |
Vector of known drugs that may end search window. By default calls
|
lastdose |
Logical - whether or not last dose time entity should be extracted. |
lastdose_window_ext |
Numeric - multiplicative factor by which
|
strength_sep |
Delimiter for contiguous medication strengths (e.g., ‘-’ for “LTG 200-300”). |
flag_window |
How far around drug (in number of characters) to look for dose change keyword - default fixed to 30. See ‘Details’ section below for further explanation. |
dosechange_dict |
List of keywords used to determine if a dose change entity is present. |
... |
Parameter settings used in extracting frequency, intake time, route, and duration. Potentially useful
parameters include |
Details
This function uses a combination of regular expressions, rule-based
approaches, and dictionaries to identify various drug entities of interest.
Specific medications to be found are specified with drug_names
, which
is not case-sensitive or space-sensitive (e.g., ‘lamotrigine XR’ is treated
the same as ‘lamotrigineXR’). Entities to be extracted include drug name, strength,
dose amount, dose, frequency, intake time, route, duration, and time of last dose. See
extract_entities
and extract_lastdose
for more details.
When searching for medication names of interest, fuzzy matching may be used.
The max_dist
argument determines the maximum edit distance allowed for
such matches. If using fuzzy matching, any drug name with less than 5
characters will only allow an edit distance of 1, regardless of the value of
max_dist
.
The purpose of the drug_list
argument is to reduce false positives by removing information that is
likely to be related to a competing drug, not our drug of interest, By default, this is “rxnorm” which
calls data(rxnorm_druglist)
. A custom drug list in the form of a character string can be supplied instead,
or can be appended to rxnorm_druglist
by specifying drug_list = c("rxnorm", custom_drug_list)
.
medExtractR
then uses this list to truncate the search window at the first appearance of an unrelated drug name.
This uses publicly available data courtesy of the U.S. National Library of Medicine (NLM), National
Institutes of Health, Department of Health and Human Services; NLM is not responsible for the product and
does not endorse or recommend this or any other product. See rxnorm_druglist
documentation for details.
Most medication entities are searched for in a window after the drug. The
dose change entity, or presence of a keyword to indicate a non-current drug
regimen, may occur before the drug name. The flag_window
argument
adjusts the width of the pre-drug window. Both flag_window
and dosechange_dict
are not default arguments to the extended function medExtractR_tapering
since that
extension uses a more flexible search window and extraction procedure. In the tapering extension,
entity extraction is more flexible, and any entity can be extracted either before
or after the drug mention. Thus functionality for dose change identification is identical to all
other dictionary-based entities.
The stength_sep
argument is NULL
by default, but can be used to
identify shorthand for morning and evening doses. For example, consider the
phrase ‘Lamotrigine 300-200’ (meaning 300 mg in the morning and 200 mg
in the evening). The argument strength_sep = '-'
identifies
the full expression 300-200 as dose strength in this phrase.
Value
data.frame with entity information. Only extractions from found entities are returned. If no dosing
information for the drug of interest is found, the following output will be returned:
entity | expr | pos |
NA | NA | NA |
The “entity” column of the output contains the formatted label for that entity, according to
the following mapping.
drug name: “DrugName”
strength: “Strength”
dose amount: “DoseAmt”
dose strength: “DoseStrength”
frequency: “Frequency”
intake time: “IntakeTime”
duration: “Duration”
route: “Route”
dose change: “DoseChange”
time of last dose: “LastDose”
Sample output:
entity | expr | pos |
DoseChange | decrease | 66:74 |
DrugName | Prograf | 78:85 |
Strength | 2 mg | 86:90 |
DoseAmt | 1 | 91:92 |
Route | by mouth | 100:108 |
Frequency | bid | 109:112 |
LastDose | 2100 | 129:133 |
References
Nelson SJ, Zeng K, Kilbourne J, Powell T, Moore R. Normalized names for clinical drugs: RxNorm at 6 years. J Am Med Inform Assoc. 2011 Jul-Aug;18(4)441-8. doi: 10.1136/amiajnl-2011-000116. Epub 2011 Apr 21. PubMed PMID: 21515544; PubMed Central PMCID: PMC3128404.
Examples
note1 <- "Progrf Oral Capsule 1 mg 3 capsules by mouth twice a day - last
dose at 10pm"
medExtractR(note1, c("prograf", "tacrolimus"), 60, "mg", 2, lastdose=TRUE)
note2 <- "Currently on lamotrigine 150-200, but will increase to lamotrigine 200mg bid"
medExtractR(note2, c("lamotrigine", "ltg"), 130, "mg", 1, strength_sep = "-")