study_AAconvergence {archetypal} | R Documentation |
Function which studies the convergence of Archetypal Analysis when using the PCHA algorithm
Description
First it finds an AA solution under given arguments while storing
all iteration history (save_history = TRUE
).
Then it computes the LOWESS [1] of SSE and its relevant UIK point [2].
Study is performed for iterations after that point.
The list of B-matrices and archetypes that were found are stored.
The archetypes are being aligned, while the B-matrices
are used for computing the used rows-weights,
leading rows-weights and maybe percentage of used rows on Convex Hull.
The Aitken SSE extrapolation plus the relevant error are computed.
The order and rate of convergence are estimated.
Finally a multi-plot panel is being created if asked.
Usage
study_AAconvergence(df, kappas, method = "projected_convexhull",
rseed = NULL, chvertices = NULL, plot = FALSE, ...)
Arguments
df |
The data frame with dimensions n x d |
kappas |
The number of archetypes |
method |
The method that will be used for computing initial approximation:
|
rseed |
The random seed that will be used for setting initial A matrix. Useful for reproducible results. |
chvertices |
The vector of rows which represents the vertices for Convex Hull (if available) |
plot |
If it is TRUE, then a panel of useful plots is created |
... |
Other arguments to be passed to function |
Details
If we take natural logarithms at the next approximate equation
\epsilon_{n+1} = c\epsilon_{n}^p
for n = 1, 2, 3, \ldots
, then we'll find
\log(\epsilon_{n+1}) = \log(c)+p\log(\epsilon_{n})
Thus a reasonable strategy for estimating order p and rate c is to perform a linear regression
on above errors, after a selected iteration.
That is the output of order_estimation
and rate_estimation
.
Value
A list with members:
SSE, a vector of all SSE from all AA iterations
SSE_lowess, a vector of LOWESS values for SSE
UIK_lowess, the UIK point [2] of SSE_lowess
aitken, a data frame of Aitken [3] extrapolation and error for SSE after UIK_lowess iteration
order_estimation, the last term in estimating order of convergence, page 56 of [4], by using SSE after UIK_lowess iteration
rate_estimation, the last term in estimating rate of convergence, page 56 of [4], by using SSE after UIK_lowess iteration
significance_estimations, a data frame with standard errors and statistical significance for estimations
used_on_convexhull, the % of used rows which lie on Convex Hull (if given), as a sequence for iterations after UIK_lowess one
aligned_archetypes, the archetypes after UIK_lowess iteration are being aligned by using
align_archetypes_from_list
. The history of archetypes creation.solution_used, the AA output that has been used. Some times useful, especially for big data.
References
[1] Cleveland, W. S. (1979) Robust locally weighted regression and smoothing scatterplots. J. Amer. Statist. Assoc. 74, 829–836.
[2] Christopoulos, Demetris T., Introducing Unit Invariant Knee (UIK) As an Objective Choice for
Elbow Point in Multivariate Data Analysis Techniques (March 1, 2016).
Available at SSRN: http://dx.doi.org/10.2139/ssrn.3043076
[3] Aitken, A. "On Bernoulli's numerical solution of algebraic equations", Proceedings of the Royal Society of Edinburgh (1926) 46 pp. 289-305.
[4] Atkinson, K. E.,An Introduction to Numerical Analysis, Wiley & Sons,1989
See Also
Examples
{
# Load data "wd2"
data(wd2)
ch = chull(wd2)
sa = study_AAconvergence(df = wd2, kappas = 3, rseed = 20191119,
verbose = FALSE, chvertices = ch)
names(sa)
# [1] "SSE" "SSE_lowess" "UIK_lowess"
# [4] "aitken" "order_estimation" "rate_estimation"
# [7] "significance_estimations" "used_on_convexhull" "aligned_archetypes"
# [10] "solution_used"
# sse=sa$SSE
# ssel=sa$SSE_lowess
sa$UIK_lowess
# [1] 36
# sa$aitken
sa$order_estimation
# [1] 1.007674
sa$rate_estimation
# [1] 0.8277613
sa$significance_estimations
# estimation std.error t.value p.value
# log(c) -0.1890305 0.014658947 -12.89523 5.189172e-12
# p 1.0076743 0.001616482 623.37475 3.951042e-50
# sa$used_on_convexhull
# sa$aligned_archetypes
data.frame(sa$solution_used[c("SSE","varexpl","iterations","time")])
# SSE varexpl iterations time
# 1 1.717538 0.9993186 62 8.39
# Plot class "study_AAconvergence"
plot(sa)
}