extract_unique_references {revtools} | R Documentation |
Create a de-duplicated data.frame
Description
Take a data.frame of bibliographic information showing potential duplicates (as returned by find_duplicates
), and return a data.frame of unique references.
Usage
extract_unique_references(x, matches)
Arguments
x |
a |
matches |
either a vector of matches, e.g. as returned from |
Value
a subsetted data.frame
containing one entry for each group identified in matches
.
Note
This function creates a simplified version of x
, by extracting the reference from each group of 'identical' references that contains the most text. It is assumed that this is the most 'complete' record of those available in the dataset. This function does not merge data from multiple 'identical' records due to the potential for mis-matching that this approach would create.
See Also
find_duplicates
for duplicate identification; screen_duplicates
for an interactive alternative to duplicate removal.
Examples
# import data
file_location <- system.file(
"extdata",
"avian_ecology_bibliography.ris",
package = "revtools"
)
x <- read_bibliography(file_location)
# generate duplicated references (for example purposes)
x_duplicated <- rbind(x, x[1:5,])
# locate and extract unique references
x_check <- find_duplicates(x_duplicated)
x_unique <- extract_unique_references(x_duplicated, matches = x_check)