numero.summary {Numero} | R Documentation |
Summarize subgroup statistics
Description
Estimates subgroup statistics after self-organizing map analysis
Usage
numero.summary(results, topology, data = NULL, capacity = 10)
Arguments
results |
A list object that contains the self-organizing map and its statistical colorings. |
topology |
A SOM topology with additional labels that indicate selected regions. |
data |
A matrix or a data frame. |
capacity |
Maximum number of subgroups to compare. |
Details
The input results
must contain the output from
numero.evaluate()
or similar.
The input argument topology
must be a definition of a SOM with
additional columns as in the output from numero.subgroup()
.
The function first looks for row names in data
that are also included
in results
. The rows are then divided into subgroups according to the
district assignments in results
and the region labels in
topology
.
Value
A data frame of summary statistics, see nroSummary()
for details. The data frame also contains additional information on which
variables were used for the training of the SOM.
The attribute 'layout' is added to the output. It indicates the location on the map and the subgroup name and label for each data row that were included in the analysis.
Author(s)
Ville-Petteri Makinen
Examples
# Import data.
fname <- system.file("extdata", "finndiane.txt", package = "Numero")
dataset <- read.delim(file = fname)
# Set identities and manage missing data.
dataset <- numero.clean(dataset, identity = "INDEX")
# Prepare training variables.
trvars <- c("CHOL", "HDL2C", "TG", "CREAT", "uALB")
trdata <- numero.prepare(data = dataset, variables = trvars)
# Create a self-organizing map.
sm <- numero.create(data = trdata)
qc <- numero.quality(model = sm)
# Evaluate map statistics for all variables.
stats <- numero.evaluate(model = qc, data = dataset)
# Define subgroups.
x <- stats$planes[,"uALB"]
tops <- which(x >= quantile(x, 0.75, na.rm=TRUE))
bottoms <- which(x <= quantile(x, 0.25, na.rm=TRUE))
elem <- data.frame(stats$map$topology, stringsAsFactors = FALSE)
elem$REGION <- "MiddleAlb"
elem$REGION[tops] <- "HighAlb"
elem$REGION[bottoms] <- "LowAlb"
elem$REGION.label <- "M"
elem$REGION.label[tops] <- "H"
elem$REGION.label[bottoms] <- "L"
# Compare subgroups.
cmp <- numero.summary(results = stats, topology = elem, data = dataset)