select_parameters.mc {geocmeans} | R Documentation |
Select parameters for clustering algorithm (multicore)
Description
Function to select the parameters for a clustering algorithm. This version of the function allows to use a plan defined with the package future to reduce calculation time.
Usage
select_parameters.mc(
algo,
data,
k,
m,
alpha = NA,
beta = NA,
nblistw = NULL,
lag_method = "mean",
window = NULL,
spconsist = TRUE,
classidx = TRUE,
nrep = 30,
indices = NULL,
standardize = TRUE,
robust = FALSE,
noise_cluster = FALSE,
delta = NA,
maxiter = 500,
tol = 0.01,
chunk_size = 5,
seed = NULL,
init = "random",
verbose = TRUE
)
selectParameters.mc(
algo,
data,
k,
m,
alpha = NA,
beta = NA,
nblistw = NULL,
lag_method = "mean",
window = NULL,
spconsist = TRUE,
classidx = TRUE,
nrep = 30,
indices = NULL,
standardize = TRUE,
robust = FALSE,
noise_cluster = FALSE,
delta = NA,
maxiter = 500,
tol = 0.01,
chunk_size = 5,
seed = NULL,
init = "random",
verbose = TRUE
)
Arguments
algo |
A string indicating which method to use (FCM, GFCM, SFCM, SGFCM) |
data |
A dataframe with numeric columns |
k |
A sequence of values for k to test (>=2) |
m |
A sequence of values for m to test |
alpha |
A sequence of values for alpha to test (NULL if not required) |
beta |
A sequence of values for beta to test (NULL if not required) |
nblistw |
A list of list.w objects describing the neighbours typically produced by the spdep package (NULL if not required) |
lag_method |
A string indicating if a classical lag must be used ("mean") or if a weighted median must be used ("median"). Both can be tested by specifying a vector : c("mean","median"). When working with rasters, the string must be parsable to a function like mean, min, max, sum, etc. and will be applied to all the pixels values in the window designated by the parameter window and weighted according to the values of this matrix. |
window |
A list of windows to use to calculate neighbouring values if rasters are used. |
spconsist |
A boolean indicating if the spatial consistency must be calculated |
classidx |
A boolean indicating if the quality of classification indices must be calculated |
nrep |
An integer indicating the number of permutation to do to simulate the random distribution of the spatial inconsistency. Only used if spconsist is TRUE. |
indices |
A character vector with the names of the indices to calculate, to evaluate clustering quality. default is :c("Silhouette.index", "Partition.entropy", "Partition.coeff", "XieBeni.index", "FukuyamaSugeno.index", "Explained.inertia"). Other available indices are : "DaviesBoulin.index", "CalinskiHarabasz.index", "GD43.index", "GD53.index" and "Negentropy.index". |
standardize |
A boolean to specify if the variable must be centered and reduce (default = True) |
robust |
A boolean indicating if the "robust" version of the algorithm must be used (see details) |
noise_cluster |
A boolean indicatong if a noise cluster must be added to the solution (see details) |
delta |
A float giving the distance of the noise cluster to each observation |
maxiter |
An integer for the maximum number of iteration |
tol |
The tolerance criterion used in the evaluateMatrices function for convergence assessment |
chunk_size |
The size of a chunk used for multiprocessing. Default is 100. |
seed |
An integer used for random number generation. It ensures that the start centers will be the same if the same integer is selected. |
init |
A string indicating how the initial centers must be selected. "random" indicates that random observations are used as centers. "kpp" use a distance based method resulting in more dispersed centers at the beginning. Both of them are heuristic. |
verbose |
A boolean indicating if a progressbar should be displayed |
Value
A dataframe with indicators assessing the quality of classifications
Examples
data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
future::plan(future::multisession(workers=2))
#set spconsist to TRUE to calculate the spatial consistency indicator
#FALSE here to reduce the time during package check
values <- select_parameters.mc("SFCM", dataset, k = 5, m = seq(1,2.5,0.1),
alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE)
## make sure any open connections are closed afterward
if (!inherits(future::plan(), "sequential")) future::plan(future::sequential)
data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
future::plan(future::multisession(workers=2))
#set spconsist to TRUE to calculate the spatial consistency indicator
#FALSE here to reduce the time during package check
values <- select_parameters.mc("SFCM", dataset, k = 5, m = seq(1,2.5,0.1),
alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE)