R: Searching of duplicated records in a bibliographic database

duplicatedMatching {bibliometrix}

R Documentation

Searching of duplicated records in a bibliographic database

Description

Search duplicated records in a dataframe.

Usage

duplicatedMatching(M, Field = "TI", exact = FALSE, tol = 0.95)

Arguments

`M`	is the bibliographic data frame.
`Field`	is a character object. It indicates one of the field tags used to identify duplicated records. Field can be equal to one of these tags: TI (title), AB (abstract), UT (manuscript ID).
`exact`	is logical. If exact = TRUE the function searches duplicates using exact matching. If exact=FALSE, the function uses the restricted Damerau-Levenshtein distance to find duplicated documents.
`tol`	is a numeric value giving the minimum relative similarity to match two manuscripts. Default value is `tol = 0.95`. To use the restricted Damerau-Levenshtein distance, exact argument has to be set as FALSE.

Details

A bibliographic data frame is obtained by the converting function convert2df. It is a data matrix with cases corresponding to manuscripts and variables to Field Tag in the original SCOPUS and Clarivate Analytics WoS file. The function identifies duplicated records in a bibliographic data frame and deletes them. Duplicate entries are identified through the restricted Damerau-Levenshtein distance. Two manuscripts that have a relative similarity measure greater than tol argument are stored in the output data frame only once.

Value

the value returned from duplicatedMatching is a data frame without duplicated records.

Examples

 
data(scientometrics, package = "bibliometrixData")

M=rbind(scientometrics[1:20,],scientometrics[10:30,])

newM <- duplicatedMatching(M, Field = "TI", exact=FALSE, tol = 0.95)

dim(newM)

[Package bibliometrix version 4.3.0 Index]