BLB_archetypal {GeomArchetypal}R Documentation

Archetypal Analysis using the Bag of Little Bootstraps

Description

Archetypal analysis using the bag of little bootstraps as the resampling approach, following [1]

Usage

BLB_archetypal(df = NULL, group_var = NULL, 
				aa_var = NULL, use_seed = NULL, 
				b = 0.6, n = 20, r = 100, n_core = 1, 
				n_iter = 30, ci_sigma = 2, 
				n_tails = 10, max_cor = 0.3, 
				verbose = TRUE, diag_less = 1e-2)

Arguments

df

The data frame with the original sample to be processed

group_var

Draw the subsample equally from groups (integer or character)

aa_var

Character vector of the variable names that will be used

use_seed

Integer, if not NULL, used as set.seed() for reproducibility

b

Numeric, set size of subsample i.e. = nrow(df)^b (default 0.6)

n

Integer, number of subsamples to generate (default 20)

r

Integer, number of bootstraps of each subsample (default 100)

n_core

Integer, number of cores used for archetypal analysis of bootstraps

n_iter

Integer, number of iterations for fast_archetypal

ci_sigma

Integer, for empirical confidence intervals

n_tails

Integer, minimum number of bootstrap estimates required in tails for robust interval estimates (default 10 each tail)

max_cor

Default = 0.20, numeric for warning on orthogonality

verbose

Logical, reports progress of each subsample and batch of bootstraps

diag_less

The expected mean distance from 1 for the diagonal elements of submatrix A[1:kappas,:]

Details

Note 1.
Without the weighted analysis idea of Kleiner et al which is inappropriate for geometrically-based archetypal analysis
Note 2.
The archetypes are defined from the minimums and maximums of the data to provide a fixed frame of reference for resampling. Resampling variation is thus simplified and only concerns compositions.
Note 3.
Assumes grouped data but allows user to supply a group variable with only one value

Value

An object of class "BLB_archetypal" which is a list with next members:

  1. arches, the Grid Archetypes

  2. aa_tests, the run statistics for all subsamples, batches and replications (bootstraps)

  3. pop_compos, the population estimates of compositions (by group or without grouping)

  4. lower_ci, the lower confidence interval at the ci_sigma sigma level

  5. upper_ci, the upper confidence interval at the ci_sigma sigma level

  6. ci_sigma, the ci_sigma level for confidence intervals

Author(s)

David. F. Midgley

References

[1] Ariel Kleiner, Ameet Talwalkar, Purnamrita Sarkar, Michael I. Jordan, doi:10.1111/rssb.12050

See Also

closer_grid_archetypal, grid_archetypal, fast_archetypal

Examples

{
# Load package
library(GeomArchetypal)
# Load data
data("gallupGPS6")
# draw a small sample
set.seed(2024)
df <- gallupGPS6[sample(1:nrow(gallupGPS6),35000,replace = FALSE),]
# invent a grouping variable
df$grp <- cut(df$risktaking, breaks = 2)
test <- BLB_archetypal(df = df, 
                        group_var = "grp",
                        aa_var = c("patience","risktaking","trust"), 
                        n = 1, r = 2, n_core = 1,
                        diag_less = 1e-2)
# will generate a warning because number of bootstraps is too small to
# estimate default confidence intervals 
# Print results of the "BLB_archetypal" class object:
print(test)
# Summarize the "BLB_archetypal" class object:
summary(test)

}

[Package GeomArchetypal version 1.0.2 Index]