archetypal {archetypal}R Documentation

archetypal: Finds the archetypal analysis of a data frame by using a variant of the PCHA algorithm

Description

Performs archetypal analysis by using Principal Convex Hull Analysis (PCHA) under a full control of all algorithmic parameters.

Usage

archetypal(df, kappas, initialrows = NULL,
  method = "projected_convexhull", nprojected = 2, npartition = 10,
  nfurthest = 10, maxiter = 2000, conv_crit = 1e-06,
  var_crit = 0.9999, verbose = TRUE, rseed = NULL, aupdate1 = 25,
  aupdate2 = 10, bupdate = 10, muAup = 1.2, muAdown = 0.5,
  muBup = 1.2, muBdown = 0.5, SSE_A_conv = 1e-09,
  SSE_B_conv = 1e-09, save_history = FALSE, nworkers = NULL,
  stop_varexpl = TRUE)

Arguments

df

The data frame with dimensions n x d

kappas

The number of archetypes

initialrows

The initial set of rows from data frame that will be used for starting algorithm

method

The method that will be used for computing initial approximation:

  1. projected_convexhull, see find_outmost_projected_convexhull_points

  2. convexhull, see find_outmost_convexhull_points

  3. partitioned_convexhull, see find_outmost_partitioned_convexhull_points

  4. furthestsum, see find_furthestsum_points

  5. outmost, see find_outmost_points

  6. random, a random set of kappas points will be used

nprojected

The dimension of the projected subspace for find_outmost_projected_convexhull_points

npartition

The number of partitions for find_outmost_partitioned_convexhull_points

nfurthest

The number of times that FurthestSum algorithm will be applied by find_furthestsum_points

maxiter

The maximum number of iterations for main algorithm application

conv_crit

The SSE convergence criterion of termination: iterate until |dSSE|/SSE<conv_crit

var_crit

The Variance Explained (VarExpl) convergence criterion of termination: iterate until VarExpl<var_crit

verbose

If it is set to TRUE, then both initialization and iteration details are printed out

rseed

The random seed that will be used for setting initial A matrix. Useful for reproducible results.

aupdate1

The number of initial applications of Aupdate for improving the initially randomly selected A matrix

aupdate2

The number of Aupdate applications in main iteration

bupdate

The number of Bupdate applications in main iteration

muAup

The factor (>1) by which muA is multiplied when it holds SSE<=SSE_old(1+SSE_A_conv)

muAdown

The factor (<1) by which muA is multiplied when it holds SSE>SSE_old(1+SSE_A_conv)

muBup

The factor (>1) by which muB is multiplied when it holds SSE<=SSE_old(1+SSE_B_conv)

muBdown

The factor (<1) by which muB is multiplied when it holds SSE>SSE_old(1+SSE_B_conv)

SSE_A_conv

The convergence value used in SSE<=SSE_old(1+SSE_A_conv). Warning: there exists a Matlab crash sometimes after setting this to 1E-16 or lower

SSE_B_conv

The convergence value used in SSE<=SSE_old(1+SSE_A_conv). Warning: there exists a Matlab crash sometimes after setting this to 1E-16 or lower

save_history

If set TRUE, then iteration history is being saved for further use

nworkers

The number of logical processors that will be used for parallel computing (usually it is the double of available physical cores). Parallel computation is applied when asked by functions find_furthestsum_points,
find_outmost_partitioned_convexhull_points and
find_outmost_projected_convexhull_points.

stop_varexpl

If set TRUE, then algorithm stops if varexpl is greater than var_crit

Value

A list with members:

  1. BY, the kappas \times d matrix of archetypes found

  2. A, the n \times kappas matrix such that Y ~ ABY or Frobenius norm ||Y-ABY|| is minimum

  3. B, the kappas \times n matrix such that Y ~ ABY or Frobenius norm ||Y-ABY|| is minimum

  4. SSE, the sum of squared error SSE = ||Y-ABY||^2

  5. varexpl, the Variance Explained = (SST-SSE)/SST where SST is the total sum of squares for data set matrix

  6. initialsolution, the initially used set of rows from data frame in order to start the algorithm

  7. freqstable, the frequency table for all found rows, if it is available.

  8. iterations, the number of main iterations done by algorithm

  9. time, the time in seconds that was spent from entire run

  10. converges, if it is TRUE, then convergence was achieved before the end of maximum allowed iterations

  11. nAup, the total number of times when it was SSE<=SSE_old(1+SSE_A_conv) in Aupdate processes. Useful for debugging purposes.

  12. nAdown, the total number of times when it was SSE>SSE_old(1+SSE_A_conv) in Aupdate processes. Useful for debugging purposes.

  13. nBup, the total number of times when it was SSE<=SSE_old(1+SSE_B_conv) in Bupdate processes. Useful for debugging purposes.

  14. nBdown, the total number of times when it was SSE>SSE_old(1+SSE_A_conv in Bupdate processes. Useful for debugging purposes.

  15. run_results, a list of iteration related details: SSE, varexpl, time, B, BY for all iterations done.

  16. Y, the n \times d matrix of initial data used

  17. data.tables, the initial data frame if column dimension is at most 3 or a list of frequencies for each variable

  18. call, the exact calling used

References

[1] M Morup and LK Hansen, "Archetypal analysis for machine learning and data mining", Neurocomputing (Elsevier, 2012). https://doi.org/10.1016/j.neucom.2011.06.033.

[2] Source: http://www.mortenmorup.dk/MMhomepageUpdated_files/Page327.htm , last accessed 2021-11-19

Examples

{
# Create a small 2D data set from 3 corner-points:
p1 = c(1,2);p2 = c(3,5);p3 = c(7,3) 
dp = rbind(p1,p2,p3);dp
set.seed(916070)
pts = t(sapply(1:20, function(i,dp){
  cc = runif(3)
  cc = cc/sum(cc)
  colSums(dp*cc)
},dp))
df = data.frame(pts)
colnames(df) = c("x","y")
# Run AA:
aa = archetypal(df = df, kappas  =  3, verbose = FALSE, save_history  =  TRUE)
# Archetypes:
archs = data.frame(aa$BY)
archs
# See main results:
names(aa)
aa[c("SSE","varexpl","iterations","time")]
# See history of iterations:
names(aa$run_results)

}

[Package archetypal version 1.3.0 Index]