archetypal {archetypal} | R Documentation |
archetypal: Finds the archetypal analysis of a data frame by using a variant of the PCHA algorithm
Description
Performs archetypal analysis by using Principal Convex Hull Analysis (PCHA) under a full control of all algorithmic parameters.
Usage
archetypal(df, kappas, initialrows = NULL,
method = "projected_convexhull", nprojected = 2, npartition = 10,
nfurthest = 10, maxiter = 2000, conv_crit = 1e-06,
var_crit = 0.9999, verbose = TRUE, rseed = NULL, aupdate1 = 25,
aupdate2 = 10, bupdate = 10, muAup = 1.2, muAdown = 0.5,
muBup = 1.2, muBdown = 0.5, SSE_A_conv = 1e-09,
SSE_B_conv = 1e-09, save_history = FALSE, nworkers = NULL,
stop_varexpl = TRUE)
Arguments
df |
The data frame with dimensions n x d |
kappas |
The number of archetypes |
initialrows |
The initial set of rows from data frame that will be used for starting algorithm |
method |
The method that will be used for computing initial approximation:
|
nprojected |
The dimension of the projected subspace for |
npartition |
The number of partitions for |
nfurthest |
The number of times that |
maxiter |
The maximum number of iterations for main algorithm application |
conv_crit |
The SSE convergence criterion of termination: iterate until |dSSE|/SSE<conv_crit |
var_crit |
The Variance Explained (VarExpl) convergence criterion of termination: iterate until VarExpl<var_crit |
verbose |
If it is set to TRUE, then both initialization and iteration details are printed out |
rseed |
The random seed that will be used for setting initial A matrix. Useful for reproducible results. |
aupdate1 |
The number of initial applications of Aupdate for improving the initially randomly selected A matrix |
aupdate2 |
The number of Aupdate applications in main iteration |
bupdate |
The number of Bupdate applications in main iteration |
muAup |
The factor (>1) by which muA is multiplied when it holds SSE<=SSE_old(1+SSE_A_conv) |
muAdown |
The factor (<1) by which muA is multiplied when it holds SSE>SSE_old(1+SSE_A_conv) |
muBup |
The factor (>1) by which muB is multiplied when it holds SSE<=SSE_old(1+SSE_B_conv) |
muBdown |
The factor (<1) by which muB is multiplied when it holds SSE>SSE_old(1+SSE_B_conv) |
SSE_A_conv |
The convergence value used in SSE<=SSE_old(1+SSE_A_conv). Warning: there exists a Matlab crash sometimes after setting this to 1E-16 or lower |
SSE_B_conv |
The convergence value used in SSE<=SSE_old(1+SSE_A_conv). Warning: there exists a Matlab crash sometimes after setting this to 1E-16 or lower |
save_history |
If set TRUE, then iteration history is being saved for further use |
nworkers |
The number of logical processors that will be used for
parallel computing (usually it is the double of available physical cores).
Parallel computation is applied when asked by functions |
stop_varexpl |
If set TRUE, then algorithm stops if varexpl is greater than var_crit |
Value
A list with members:
-
BY
, thekappas \times d
matrix of archetypes found -
A
, then \times kappas
matrix such that Y ~ ABY or Frobenius norm ||Y-ABY|| is minimum -
B
, thekappas \times n
matrix such that Y ~ ABY or Frobenius norm ||Y-ABY|| is minimum -
SSE
, the sum of squared error SSE = ||Y-ABY||^2 -
varexpl
, the Variance Explained = (SST-SSE)/SST where SST is the total sum of squares for data set matrix -
initialsolution
, the initially used set of rows from data frame in order to start the algorithm -
freqstable
, the frequency table for all found rows, if it is available. -
iterations
, the number of main iterations done by algorithm -
time
, the time in seconds that was spent from entire run -
converges
, if it is TRUE, then convergence was achieved before the end of maximum allowed iterations -
nAup
, the total number of times when it was SSE<=SSE_old(1+SSE_A_conv) in Aupdate processes. Useful for debugging purposes. -
nAdown
, the total number of times when it was SSE>SSE_old(1+SSE_A_conv) in Aupdate processes. Useful for debugging purposes. -
nBup
, the total number of times when it was SSE<=SSE_old(1+SSE_B_conv) in Bupdate processes. Useful for debugging purposes. -
nBdown
, the total number of times when it was SSE>SSE_old(1+SSE_A_conv in Bupdate processes. Useful for debugging purposes. -
run_results
, a list of iteration related details: SSE, varexpl, time, B, BY for all iterations done. -
Y
, then \times d
matrix of initial data used -
data.tables
, the initial data frame if column dimension is at most 3 or a list of frequencies for each variable -
call
, the exact calling used
References
[1] M Morup and LK Hansen, "Archetypal analysis for machine learning and data mining", Neurocomputing (Elsevier, 2012). https://doi.org/10.1016/j.neucom.2011.06.033.
[2] Source: https://mortenmorup.dk/?page_id=2 , last accessed 2024-03-09
Examples
{
# Create a small 2D data set from 3 corner-points:
p1 = c(1,2);p2 = c(3,5);p3 = c(7,3)
dp = rbind(p1,p2,p3);dp
set.seed(916070)
pts = t(sapply(1:20, function(i,dp){
cc = runif(3)
cc = cc/sum(cc)
colSums(dp*cc)
},dp))
df = data.frame(pts)
colnames(df) = c("x","y")
# Run AA:
aa = archetypal(df = df, kappas = 3, verbose = FALSE, save_history = TRUE)
# Print class "archetypal":
print(aa)
# Summary class "archetypal":
summary(aa)
# Plot class "archetypal":
plot(aa)
# See history of iterations:
names(aa$run_results)
}