R: Finds the optimal updating parameters to be used for the PCHA...

find_pcha_optimal_parameters {archetypal}

R Documentation

Finds the optimal updating parameters to be used for the PCHA algorithm

Description

After creating a grid on the space of (mu_up, mu_down) it runs archetypal by using a given method & other running options passed by ellipsis (...) and finally finds those values which minimize the SSE at the end of testing_iters iterations (default=10).

Usage

find_pcha_optimal_parameters(df, kappas, method = "projected_convexhull", 
testing_iters = 10, nworkers = NULL, nprojected = 2, npartition = 10,
nfurthest = 100, sortrows = FALSE,
mup1 = 1.1, mup2 = 2.50, mdown1 = 0.1, mdown2 = 0.5, nmup = 10, nmdown = 10,
rseed = NULL, plot = FALSE, ...)

Arguments

`df`	The data frame with dimensions n x d
`kappas`	The number of archetypes
`method`	The method that will be used for computing initial approximation: projected_convexhull, see `find_outmost_projected_convexhull_points` convexhull, see `find_outmost_convexhull_points` partitioned_convexhull, see `find_outmost_partitioned_convexhull_points` furthestsum, see `find_furthestsum_points` outmost, see `find_outmost_points` random, a random set of kappas points will be used
`testing_iters`	The maximum number of iterations to run for every pair (mu_up, mu_down) of parameters
`nworkers`	The number of logical processors that will be used for parallel computing (usually it is the double of available physical cores)
`nprojected`	The dimension of the projected subspace for `find_outmost_projected_convexhull_points`
`npartition`	The number of partitions for `find_outmost_partitioned_convexhull_points`
`nfurthest`	The number of times that `FurthestSum` algorithm will be applied
`sortrows`	If it is TRUE, then rows will be sorted in `find_furthestsum_points`
`mup1`	The minimum value of mu_up, default is 1.1
`mup2`	The maximum value of mu_up, default is 2.5
`mdown1`	The minimum value of mu_down, default is 0.1
`mdown2`	The maximum value of mu_down, default is 0.5
`nmup`	The number of points to be taken for [mup1,mup2], default is 10
`nmdown`	The number of points to be taken for [mdown1,mdown2]
`rseed`	The random seed that will be used for setting initial A matrix. Useful for reproducible results
`plot`	If it is TRUE, then a 3D plot for (mu_up, mu_down, SSE) is created
`...`	Other arguments to be passed to function `archetypal`

Value

A list with members:

mu_up_opt, the optimal found value for muAup and muBup
mu_down_opt, the optimal found value for muAdown and muBdown
min_sse, the minimum SSE which corresponds to (mu_up_opt,mu_down_opt)
seed_used, the used random seed, absolutely necessary for reproducing optimal results
method_used, the method that was used for creating the initial solution
sol_initial, the initial solution that was used for all grid computations
testing_iters, the maximum number of iterations done by every grid computation

Examples

{
data("wd25")
out = find_pcha_optimal_parameters(df = wd25, kappas = 5, rseed = 2020)
# Time difference of 30.91101 secs
# mu_up_opt mu_down_opt     min_sse 
# 2.188889    0.100000    4.490980  
# Run now given the above optimal found parameters:
aa = archetypal(df = wd25, kappas = 5,
                initialrows = out$sol_initial, rseed = out$seed_used,
                muAup = out$mu_up_opt, muAdown = out$mu_down_opt,
                muBup = out$mu_up_opt, muBdown = out$mu_down_opt)
aa[c("SSE", "varexpl", "iterations", "time" )]
# $SSE
# [1] 3.629542
# 
# $varexpl
# [1] 0.9998924
# 
# $iterations
# [1] 146
# 
# $time
# [1] 21.96
# Compare it with a simple solution (time may vary)
aa2 = archetypal(df = wd25, kappas = 5, rseed = 2020)
aa2[c("SSE", "varexpl", "iterations", "time" )]
# $SSE
# [1] 3.629503
# 
# $varexpl
# [1] 0.9998924
# 
# $iterations
# [1] 164
# 
# $time
# [1] 23.55
## Of course the above was a "toy example", if your data has thousands or million rows,
## then the time reduction is much more conspicuous.
# Close plot device:
dev.off()

}