represent {representr} | R Documentation |
Create a representative dataset post record-linkage.
Description
Create a representative dataset post record-linkage.
Usage
represent(
data,
linkage,
rep_method,
parallel = TRUE,
cores = NULL,
...,
scale = FALSE
)
Arguments
data |
A data frame of records to be represented. |
linkage |
A numeric vector indicating the cluster ids post-record linkage for each record in |
rep_method |
Which method to use for representation. Valid options include "proto_minimax", "proto_random", and "composite". |
parallel |
Logical flag if to use parallel computation or not (via |
cores |
If specified, the number of cores to use with |
... |
Additional parameters sent to cluster representation function. See prototyping or composite methods. |
scale |
If "proto_minimax" method is specified, logical flag to indicate if the column-type distance function should be scaled so that each distance takes value in [0, 1]. Defaults to FALSE. |
Examples
data("rl_reg1")
## random prototyping
rep_dat_random <- represent(rl_reg1, identity.rl_reg1, "proto_random", id = FALSE, parallel = FALSE)
head(rep_dat_random)
## minimax prototyping
col_type <- c("string", "string", "numeric", "numeric", "numeric", "categorical", "ordinal",
"numeric", "numeric")
orders <- list(education = c("Less than a high school diploma", "High school graduates, no college",
"Some college or associate degree", "Bachelor's degree only", "Advanced degree"))
weights <- c(.25, .25, .05, .05, .1, .15, .05, .05, .05)
rep_dat_minimax <- represent(rl_reg1, identity.rl_reg1, "proto_minimax", id = FALSE,
distance = dist_col_type, col_type = col_type, weights = weights, orders = orders,
scale = TRUE, parallel = FALSE)
head(rep_dat_minimax)
## Not run:
## with alternative tie breaker
rep_dat_minimax <- represent(rl_reg1, identity.rl_reg1, "proto_minimax", id = FALSE,
distance = dist_col_type, col_type = col_type, weights = weights, orders = orders,
ties_fn = "maxmin_compare", scale = TRUE, parallel = FALSE)
head(rep_dat_minimax)
rep_dat_minimax <- represent(rl_reg1, identity.rl_reg1, "proto_minimax", id = FALSE,
distance = dist_col_type, col_type = col_type, weights = weights, orders = orders,
ties_fn = "within_category_compare_cpp", scale = TRUE, parallel = FALSE)
head(rep_dat_minimax)
## composite prototyping
rep_dat_composite <- represent(rl_reg1, identity.rl_reg1, "composite",
col_type = col_type, parallel = FALSE)
head(rep_dat_composite)
## End(Not run)
[Package representr version 0.1.5 Index]