R: archetypal: Finds the archetypal analysis of a data frame by...

archetypal {archetypal}

R Documentation

archetypal: Finds the archetypal analysis of a data frame by using a variant of the PCHA algorithm

Description

Performs archetypal analysis by using Principal Convex Hull Analysis (PCHA) under a full control of all algorithmic parameters.

Usage

archetypal(df, kappas, initialrows = NULL,
  method = "projected_convexhull", nprojected = 2, npartition = 10,
  nfurthest = 10, maxiter = 2000, conv_crit = 1e-06,
  var_crit = 0.9999, verbose = TRUE, rseed = NULL, aupdate1 = 25,
  aupdate2 = 10, bupdate = 10, muAup = 1.2, muAdown = 0.5,
  muBup = 1.2, muBdown = 0.5, SSE_A_conv = 1e-09,
  SSE_B_conv = 1e-09, save_history = FALSE, nworkers = NULL,
  stop_varexpl = TRUE)

Arguments

`df`	The data frame with dimensions n x d
`kappas`	The number of archetypes
`initialrows`	The initial set of rows from data frame that will be used for starting algorithm
`method`	The method that will be used for computing initial approximation: projected_convexhull, see `find_outmost_projected_convexhull_points` convexhull, see `find_outmost_convexhull_points` partitioned_convexhull, see `find_outmost_partitioned_convexhull_points` furthestsum, see `find_furthestsum_points` outmost, see `find_outmost_points` random, a random set of kappas points will be used
`nprojected`	The dimension of the projected subspace for `find_outmost_projected_convexhull_points`
`npartition`	The number of partitions for `find_outmost_partitioned_convexhull_points`
`nfurthest`	The number of times that `FurthestSum` algorithm will be applied by `find_furthestsum_points`
`maxiter`	The maximum number of iterations for main algorithm application
`conv_crit`	The SSE convergence criterion of termination: iterate until \|dSSE\|/SSE<conv_crit
`var_crit`	The Variance Explained (VarExpl) convergence criterion of termination: iterate until VarExpl<var_crit
`verbose`	If it is set to TRUE, then both initialization and iteration details are printed out
`rseed`	The random seed that will be used for setting initial A matrix. Useful for reproducible results.
`aupdate1`	The number of initial applications of Aupdate for improving the initially randomly selected A matrix
`aupdate2`	The number of Aupdate applications in main iteration
`bupdate`	The number of Bupdate applications in main iteration
`muAup`	The factor (>1) by which muA is multiplied when it holds SSE<=SSE_old(1+SSE_A_conv)
`muAdown`	The factor (<1) by which muA is multiplied when it holds SSE>SSE_old(1+SSE_A_conv)
`muBup`	The factor (>1) by which muB is multiplied when it holds SSE<=SSE_old(1+SSE_B_conv)
`muBdown`	The factor (<1) by which muB is multiplied when it holds SSE>SSE_old(1+SSE_B_conv)
`SSE_A_conv`	The convergence value used in SSE<=SSE_old(1+SSE_A_conv). Warning: there exists a Matlab crash sometimes after setting this to 1E-16 or lower
`SSE_B_conv`	The convergence value used in SSE<=SSE_old(1+SSE_A_conv). Warning: there exists a Matlab crash sometimes after setting this to 1E-16 or lower
`save_history`	If set TRUE, then iteration history is being saved for further use
`nworkers`	The number of logical processors that will be used for parallel computing (usually it is the double of available physical cores). Parallel computation is applied when asked by functions `find_furthestsum_points`, `find_outmost_partitioned_convexhull_points` and `find_outmost_projected_convexhull_points`.
`stop_varexpl`	If set TRUE, then algorithm stops if varexpl is greater than var_crit

Value

A list with members:

BY, the kappas \times d matrix of archetypes found
A, the n \times kappas matrix such that Y ~ ABY or Frobenius norm ||Y-ABY|| is minimum
B, the kappas \times n matrix such that Y ~ ABY or Frobenius norm ||Y-ABY|| is minimum
SSE, the sum of squared error SSE = ||Y-ABY||^2
varexpl, the Variance Explained = (SST-SSE)/SST where SST is the total sum of squares for data set matrix
initialsolution, the initially used set of rows from data frame in order to start the algorithm
freqstable, the frequency table for all found rows, if it is available.
iterations, the number of main iterations done by algorithm
time, the time in seconds that was spent from entire run
converges, if it is TRUE, then convergence was achieved before the end of maximum allowed iterations
nAup, the total number of times when it was SSE<=SSE_old(1+SSE_A_conv) in Aupdate processes. Useful for debugging purposes.
nAdown, the total number of times when it was SSE>SSE_old(1+SSE_A_conv) in Aupdate processes. Useful for debugging purposes.
nBup, the total number of times when it was SSE<=SSE_old(1+SSE_B_conv) in Bupdate processes. Useful for debugging purposes.
nBdown, the total number of times when it was SSE>SSE_old(1+SSE_A_conv in Bupdate processes. Useful for debugging purposes.
run_results, a list of iteration related details: SSE, varexpl, time, B, BY for all iterations done.
Y, the n \times d matrix of initial data used
data.tables, the initial data frame if column dimension is at most 3 or a list of frequencies for each variable
call, the exact calling used

References

[1] M Morup and LK Hansen, "Archetypal analysis for machine learning and data mining", Neurocomputing (Elsevier, 2012). https://doi.org/10.1016/j.neucom.2011.06.033.

[2] Source: https://mortenmorup.dk/?page_id=2 , last accessed 2024-03-09

Examples

{
	# Create a small 2D data set from 3 corner-points:
	p1 = c(1,2);p2 = c(3,5);p3 = c(7,3)
	dp = rbind(p1,p2,p3);dp
	set.seed(916070)
	pts = t(sapply(1:20, function(i,dp){
	  cc = runif(3)
	  cc = cc/sum(cc)
	  colSums(dp*cc)
	},dp))
	df = data.frame(pts)
	colnames(df) = c("x","y")
	# Run AA:
	aa = archetypal(df = df, kappas  =  3, verbose = FALSE, save_history  =  TRUE)
	# Print class "archetypal":
	print(aa)
	# Summary class "archetypal":
	summary(aa)
	# Plot class "archetypal":
	plot(aa)
	# See history of iterations:
	names(aa$run_results)	

}

[Package archetypal version 1.3.1 Index]