maxp_sa {rgeoda} | R Documentation |
A simulated annealing algorithm to solve the max-p-region problem
Description
The max-p-region problem is a special case of constrained clustering where a finite number of geographical areas are aggregated into the maximum number of regions (max-p-regions), such that each region is geographically connected and the clusters could maximize internal homogeneity.
Usage
maxp_sa(
w,
df,
bound_variable,
min_bound,
cooling_rate,
sa_maxit = 1,
iterations = 99,
initial_regions = vector("numeric"),
scale_method = "standardize",
distance_method = "euclidean",
random_seed = 123456789,
cpu_threads = 6,
rdist = numeric()
)
Arguments
w |
An instance of Weight class |
df |
A data frame with selected variables only. E.g. guerry[c("Crm_prs", "Crm_prp", "Litercy")] |
bound_variable |
A numeric vector of selected bounding variable |
min_bound |
A minimum value that the sum value of bounding variable int each cluster should be greater than |
cooling_rate |
The cooling rate of a simulated annealing algorithm. Defaults to 0.85 |
sa_maxit |
(optional): The number of iterations of simulated annealing. Defaults to 1 |
iterations |
(optional): The number of iterations of SA algorithm. Defaults to 99. |
initial_regions |
(optional): The initial regions that the local search starts with. Default is empty. means the local search starts with a random process to "grow" clusters |
scale_method |
(optional) One of the scaling methods 'raw', 'standardize', 'demean', 'mad', 'range_standardize', 'range_adjust' to apply on input data. Default is 'standardize' (Z-score normalization). |
distance_method |
(optional) The distance method used to compute the distance betwen observation i and j. Defaults to "euclidean". Options are "euclidean" and "manhattan" |
random_seed |
(optional) The seed for random number generator. Defaults to 123456789. |
cpu_threads |
(optional) The number of cpu threads used for parallel computation |
rdist |
(optional) The distance matrix (lower triangular matrix, column wise storage) |
Value
A names list with names "Clusters", "Total sum of squares", "Within-cluster sum of squares", "Total within-cluster sum of squares", and "The ratio of between to total sum of squares".
Examples
## Not run:
library(sf)
guerry_path <- system.file("extdata", "Guerry.shp", package = "rgeoda")
guerry <- st_read(guerry_path)
queen_w <- queen_weights(guerry)
data <- guerry[c('Crm_prs','Crm_prp','Litercy','Donatns','Infants','Suicids')]
bound_variable <- guerry['Pop1831']
min_bound <- 3236.67 # 10% of Pop1831
maxp_clusters <- maxp_sa(queen_w, data, bound_variable, min_bound, cooling_rate=0.85, sa_maxit=1)
maxp_clusters
## End(Not run)