optimize_microdata {synthACS}R Documentation

Optimize the selection of a micro data population.

Description

Optimize the candidate micro dataset such that the lowest loss against the macro dataset constraints is obtained. Loss is defined here as total absolute error (TAE) and constraints are defined by the constraint_list. Optimization is done by simulated annealing–see details.

Usage

optimize_microdata(
  micro_data,
  prob_name = "p",
  constraint_list,
  tolerance = round(sum(constraint_list[[1]])/2000 * length(constraint_list), 0),
  resample_size = min(sum(constraint_list[[1]]), max(500,
    round(sum(constraint_list[[1]]) * 0.005, 0))),
  p_accept = 0.4,
  max_iter = 10000L,
  seed = sample.int(10000L, size = 1, replace = FALSE),
  verbose = TRUE
)

Arguments

micro_data

A data.frame of micro data observations.

prob_name

It is assumed that observations are weighted and do not have an equal probability of occurance. This string specifies the variable within micro_data that contains the probability of selection.

constraint_list

A list of constraining macro data attributes. See add_constraint

tolerance

An integer giving the maximum acceptable loss (TAE), enabling early stopping. Defaults to a misclassification rate of 1 individual per 1,000 per constraint.

resample_size

An integer controlling the rate of movement about the candidate space. Specifically, it specifies the number of observations to change between iterations. Defaults to 0.5% the number of observations.

p_accept

The acceptance probability for the Metropolis acceptance criteria.

max_iter

The maximum number of allowable iterations. Defaults to 10000L

seed

A seed for reproducibility. See set.seed

verbose

Logical. Do you wish to see verbose output? Defaults to TRUE

Details

Spatial microsimulation involves the study of individual-level phenomena within a specified set of geographies in which these individuals act. It involves the creation of synthetic data to model, via simulation, these phenomena. As a first step to simulation, an appropriate micro-level (ie. individual) dataset must be generated. This function creates such appropriate micro-level datasets given a set of candidate observations and macro-level constraints.

Optimization is done via simulated annealing, where we wish to minimize the total absolute error (TAE) between the micro-data and the macro-constraints. The annealing procedure is controlled by the parameters tolerance, resample_size, p_accept, and max_iter. Specifically, tolerance indicates the maximum allowable TAE between the output micro-data and the macro-constraints within a given max_iter allowable iterations to converge. resample_size and p_accept control movement about the candidate space. Specfically, resample_size controls the jump size between neighboring candidates and p_accept controls the hill-climbing rate for exiting local minima.

Please see the references for a more detailed discussion of the simulated annealing procedure.

References

Ingber, Lester. "Very fast simulated re-annealing." Mathematical and computer modelling 12.8 (1989): 967-973.

Metropolis, Nicholas, et al. "Equation of state calculations by fast computing machines." The journal of chemical physics 21.6 (1953): 1087-1092.

Szu, Harold, and Ralph Hartley. "Fast simulated annealing." Physics letters A 122.3 (1987): 157-162.

Examples

## Not run: 
## assumes you have micro_synthetic object named test_micro and constraint_list named c_list
opt_data <- optimize_microdata(test_micro, "p", c_list, max_iter= 10, resample_size= 500, 
              p_accept= 0.01, verbose= FALSE)

## End(Not run)

[Package synthACS version 1.7.1 Index]