pp_weights {representr} | R Documentation |
Get posterior weights for each record post record-linkage using posterior prototyping.
Description
Get posterior weights for each record post record-linkage using posterior prototyping.
Usage
pp_weights(
data,
posterior_linkage,
rep_method,
parallel = TRUE,
cores = NULL,
...,
scale = FALSE,
save_loc = NULL,
verbose = FALSE
)
Arguments
data |
A data frame of records to be represented. |
posterior_linkage |
A matrix of size m x n, indicating the posterior cluster ids post-record linkage,
each row represents the cluster assignment for each record in |
rep_method |
Which method to use for representation. Valid options include "proto_minimax" and "proto_random". |
parallel |
Logical flag if to use parallel computation or not (via |
cores |
If specified, the number of cores to use with |
... |
Additional parameters sent to cluster representation function. See
minimax or random methods. If passing a probability to
the random method, must be list of the same length as the number of iterations in lambda and within each must be
a list of the same length as the number of clusters. Within each should be a vector of probabilities, the same length
as the number of rows in the cluster |
scale |
If "proto_minimax" method is specified, logical flag to indicate if the column-type distance function should be scaled so that each distance takes value in [0, 1]. Defaults to FALSE. |
save_loc |
Location to save intermediate progress. If NULL, no intermediate progress is saved. |
verbose |
Flag for progress messages. |
Examples
data(rl_reg1)
# make a fake posterior distribution for the linkage
m <- 10
n <- nrow(rl_reg1)
post_link <- matrix(sample(seq_len(n), n*m, replace = TRUE), nrow = m)
# get the posterior prototyping weights
col_type <- c("string", "string", "numeric", "numeric", "numeric", "categorical", "ordinal",
"numeric", "numeric")
orders <- list(education = c("Less than a high school diploma", "High school graduates, no college",
"Some college or associate degree", "Bachelor's degree only", "Advanced degree"))
weights <- c(.25, .25, .05, .05, .1, .15, .05, .05, .05)
pp_weight <- pp_weights(rl_reg1, post_link, "proto_minimax", distance = dist_col_type,
col_type = col_type, weights = weights, orders = orders, scale = TRUE, parallel = FALSE)
# threshold by posterior prototyping weights
head(rl_reg1[pp_weight > 0.5, ])