deid_dua {duawranglr} | R Documentation |
Convert identifying variable to unique hash
Description
Convert a column of unique but restricted IDs into a set of new IDs using secure (SHA-2) hashing algorithm. Users have the option of saving a crosswalk between the old and new IDs in case observations need to reidentified at a later date.
Usage
deid_dua(
df,
id_col = NULL,
new_id_name = "id",
id_length = 64,
existing_crosswalk = NULL,
write_crosswalk = FALSE,
crosswalk_filename = NULL
)
Arguments
df |
Data frame |
id_col |
Column name with IDs to be replaced. By default it is
|
new_id_name |
New hashed ID column name, which must be different from old name. |
id_length |
Length of new hashed ID; cannot be fewer than 12 characters (default is 64 characters). |
existing_crosswalk |
File name of existing crosswalk. If
existing crosswalk is used, then |
write_crosswalk |
Write crosswalk between old ID and new hash
ID to console (unless |
crosswalk_filename |
Name of crosswalk file with path; defaults to generic name with current date (YYYYMMDD) appended. |
Examples
## --------------
## Setup
## --------------
## set DUA crosswalk
dua_cw <- system.file('extdata', 'dua_cw.csv', package = 'duawranglr')
set_dua_cw(dua_cw)
## read in data
admin <- system.file('extdata', 'admin_data.csv', package = 'duawranglr')
df <- read_dua_file(admin)
## --------------
## show identified data
df
## deidentify
df <- deid_dua(df, id_col = 'sid', new_id_name = 'id', id_length = 12)
## show deidentified data
df
## Not run:
## save crosswalk between old and new ids for future
deid_dua(df, write_crosswalk = TRUE)
## use existing crosswalk (good for panel datasets that need link)
deid_dua(df, existing_crosswalk = './crosswalk/master_crosswalk.csv')
## End(Not run)