represent {representr}R Documentation

Create a representative dataset post record-linkage.

Description

Create a representative dataset post record-linkage.

Usage

represent(
  data,
  linkage,
  rep_method,
  parallel = TRUE,
  cores = NULL,
  ...,
  scale = FALSE
)

Arguments

data

A data frame of records to be represented.

linkage

A numeric vector indicating the cluster ids post-record linkage for each record in data.

rep_method

Which method to use for representation. Valid options include "proto_minimax", "proto_random", and "composite".

parallel

Logical flag if to use parallel computation or not (via foreach).

cores

If specified, the number of cores to use with foreach.

...

Additional parameters sent to cluster representation function. See prototyping or composite methods.

scale

If "proto_minimax" method is specified, logical flag to indicate if the column-type distance function should be scaled so that each distance takes value in [0, 1]. Defaults to FALSE.

Examples


data("rl_reg1")

## random prototyping
rep_dat_random <- represent(rl_reg1, identity.rl_reg1, "proto_random", id = FALSE, parallel = FALSE)
head(rep_dat_random)

## minimax prototyping
col_type <- c("string", "string", "numeric", "numeric", "numeric", "categorical", "ordinal",
    "numeric", "numeric")
orders <- list(education = c("Less than a high school diploma", "High school graduates, no college",
    "Some college or associate degree", "Bachelor's degree only", "Advanced degree"))
weights <- c(.25, .25, .05, .05, .1, .15, .05, .05, .05)
rep_dat_minimax <- represent(rl_reg1, identity.rl_reg1, "proto_minimax", id = FALSE,
    distance = dist_col_type, col_type = col_type, weights = weights, orders = orders,
    scale = TRUE, parallel = FALSE)
head(rep_dat_minimax)

## Not run: 
## with alternative tie breaker
rep_dat_minimax <- represent(rl_reg1, identity.rl_reg1, "proto_minimax", id = FALSE,
    distance = dist_col_type, col_type = col_type, weights = weights, orders = orders,
    ties_fn = "maxmin_compare", scale = TRUE, parallel = FALSE)
head(rep_dat_minimax)

rep_dat_minimax <- represent(rl_reg1, identity.rl_reg1, "proto_minimax", id = FALSE,
    distance = dist_col_type, col_type = col_type, weights = weights, orders = orders,
    ties_fn = "within_category_compare_cpp", scale = TRUE, parallel = FALSE)
head(rep_dat_minimax)

## composite prototyping
rep_dat_composite <- represent(rl_reg1, identity.rl_reg1, "composite",
                               col_type = col_type, parallel = FALSE)
head(rep_dat_composite)

## End(Not run)


[Package representr version 0.1.5 Index]