find_pcha_optimal_parameters {archetypal}R Documentation

Finds the optimal updating parameters to be used for the PCHA algorithm

Description

After creating a grid on the space of (mu_up, mu_down) it runs archetypal by using a given method & other running options passed by ellipsis (...) and finally finds those values which minimize the SSE at the end of testing_iters iterations (default=10).

Usage

find_pcha_optimal_parameters(df, kappas, method = "projected_convexhull", 
testing_iters = 10, nworkers = NULL, nprojected = 2, npartition = 10,
nfurthest = 100, sortrows = FALSE,
mup1 = 1.1, mup2 = 2.50, mdown1 = 0.1, mdown2 = 0.5, nmup = 10, nmdown = 10,
rseed = NULL, plot = FALSE, ...)

Arguments

df

The data frame with dimensions n x d

kappas

The number of archetypes

method

The method that will be used for computing initial approximation:

  1. projected_convexhull, see find_outmost_projected_convexhull_points

  2. convexhull, see find_outmost_convexhull_points

  3. partitioned_convexhull, see find_outmost_partitioned_convexhull_points

  4. furthestsum, see find_furthestsum_points

  5. outmost, see find_outmost_points

  6. random, a random set of kappas points will be used

testing_iters

The maximum number of iterations to run for every pair (mu_up, mu_down) of parameters

nworkers

The number of logical processors that will be used for parallel computing (usually it is the double of available physical cores)

nprojected

The dimension of the projected subspace for find_outmost_projected_convexhull_points

npartition

The number of partitions for find_outmost_partitioned_convexhull_points

nfurthest

The number of times that FurthestSum algorithm will be applied

sortrows

If it is TRUE, then rows will be sorted in find_furthestsum_points

mup1

The minimum value of mu_up, default is 1.1

mup2

The maximum value of mu_up, default is 2.5

mdown1

The minimum value of mu_down, default is 0.1

mdown2

The maximum value of mu_down, default is 0.5

nmup

The number of points to be taken for [mup1,mup2], default is 10

nmdown

The number of points to be taken for [mdown1,mdown2]

rseed

The random seed that will be used for setting initial A matrix. Useful for reproducible results

plot

If it is TRUE, then a 3D plot for (mu_up, mu_down, SSE) is created

...

Other arguments to be passed to function archetypal

Value

A list with members:

  1. mu_up_opt, the optimal found value for muAup and muBup

  2. mu_down_opt, the optimal found value for muAdown and muBdown

  3. min_sse, the minimum SSE which corresponds to (mu_up_opt,mu_down_opt)

  4. seed_used, the used random seed, absolutely necessary for reproducing optimal results

  5. method_used, the method that was used for creating the initial solution

  6. sol_initial, the initial solution that was used for all grid computations

  7. testing_iters, the maximum number of iterations done by every grid computation

See Also

find_closer_points

Examples

{
data("wd25")
out = find_pcha_optimal_parameters(df = wd25, kappas = 5, rseed = 2020)
# Time difference of 30.91101 secs
# mu_up_opt mu_down_opt     min_sse 
# 2.188889    0.100000    4.490980  
# Run now given the above optimal found parameters:
aa = archetypal(df = wd25, kappas = 5,
                initialrows = out$sol_initial, rseed = out$seed_used,
                muAup = out$mu_up_opt, muAdown = out$mu_down_opt,
                muBup = out$mu_up_opt, muBdown = out$mu_down_opt)
aa[c("SSE", "varexpl", "iterations", "time" )]
# $SSE
# [1] 3.629542
# 
# $varexpl
# [1] 0.9998924
# 
# $iterations
# [1] 146
# 
# $time
# [1] 21.96
# Compare it with a simple solution (time may vary)
aa2 = archetypal(df = wd25, kappas = 5, rseed = 2020)
aa2[c("SSE", "varexpl", "iterations", "time" )]
# $SSE
# [1] 3.629503
# 
# $varexpl
# [1] 0.9998924
# 
# $iterations
# [1] 164
# 
# $time
# [1] 23.55
## Of course the above was a "toy example", if your data has thousands or million rows,
## then the time reduction is much more conspicuous.
# Close plot device:
dev.off()

}

[Package archetypal version 1.3.0 Index]