preprocLinkage {PreProcessRecordLinkage}R Documentation

Record Linkage with Data Preprocessing

Description

This function records linkage along with data preprocessing. It has been meticulously executed to cover a wide range of datasets, ensuring that variable names are standardized using synonyms. This approach facilitates seamless data integration and analysis across various datasets.

Usage

preprocLinkage(d1,d2,chz="NULL",var=c("age","sex"),threshold=0.9)

Arguments

d1

A data frame.

d2

A data frame.

chz

the number of the name of the variable that the user does not want to change based on the output of the preproc function.

var

The vector of the names of the blocked variables that the user chooses based on the output of the selVar function that gives the vector of the names of the common variables between the two data sets.

threshold

A numeric value between 0 and 1.

Details

The results are stored in the .csv files, but if the number of records exceeds one million, they are stored in the rdata files.

Value

Two csv files or two rdata files.

Note

Note that, to see the results in the created file, first call the data.table package.

Author(s)

Hossein Hassani and and Leila Marvian Mashhad.

See Also

selVar, chzInput

Examples

  d1 = RLdata500
  d2 = RLdata10000
  preprocLinkage(d1, d2, var = "by")
 

[Package PreProcessRecordLinkage version 1.0.1 Index]