StarCoordinates {RadialVisGadgets} | R Documentation |
Star Coordinates Gadget
Description
Creates a RShiny Gadget for Star Coordinates
Usage
StarCoordinates(
df,
color = NULL,
approach = "Standard",
numericRepresentation = TRUE,
meanCentered = TRUE,
projMatrix = NULL,
clusterFunc = NULL
)
Arguments
df |
A dataframe with the data to explore. It should contain only numeric or factor columns. |
color |
column where labels from the data are extracted. |
approach |
Standard approach as defined by Kandogan, or Orthographic Star Coordinates (OSC) with a recondition as defined by Lehmann and Thiesel |
numericRepresentation |
if true attempt to convert all factors to numeric representation, otherwise used mixed representation as defined in Hinted Star Coordinates |
meanCentered |
center the projection at the mean of the values. May allow for easier value estimation |
projMatrix |
a pre-defined projection matrix as an initial configuration. Should be defined in the same fashion as the output |
clusterFunc |
function to define hints, assume increase in value of the function is an increase in quality of the projection. The function will be called with two parameters (points, labels) |
Details
Star Coordinate's (SC) goal is to generate a configuration which reveals the underlying nature of the data for cluster analysis,
outlier detection, and exploratory data analysis, e.g., by investigating the effect of specific dimensions on the separation of the data.
Traditional SC are defined for multidimensional numerical data sets X=\{\mathbf{p}_1,\ldots, \mathbf{p}_N\},
for N data points \mathbf{x}_i \in \mathbf{R}^{d}
of dimensionality d. Let A =\{ \mathbf{a}_{1}, \dots, \mathbf{a}_{d} \} ,
be a set of (typically 2D) vectors, each corresponding to one of the d dimensions.
The projection \mathbf{p}_i' \in \mathbf{R}^{2},
of a multidimensional point \mathbf{p}_i = (p_{i1},\ldots,p_{id}) \in \mathbf{R}^{d},
in SC is then defined as:
\mathbf{x}_i' = \sum_{j=1}^{d} \mathbf{a}_{j} g_j( \mathbf{p}_i),
with
g_j(\mathbf{p}_i) = \frac{p_{ij} - min_j}{max_j - min_j} ,
and (min_j,max_j),
denoting the value range of dimension j.
In the case of categorical dimensions, the values when numericRepresentation= TRUE are mapped into numerical type i.e. as.numeric()
However equally spaced categorical points may not reflect the true nature of the data. Instead, a frequency-based
representation may be applied for individual data points.
Assuming a categorical dimension j, we calculate the frequency f_{jk},
of each category k of dimension j.
The respective axis vector \mathbf{a}_{j},
is then divided into according blocks, whose size represent the relative frequency (or probability)
\frac{f_{jk}}{\sum_{l=1}^m f_{jl}},
of each of the m categories of dimension j.
In summary, given an order for each categorical dimension, the Equation g(),
above can be extended to SC for mixed data by:
g_j(\mathbf{x}_i) = F_j(x_{ij}) - \frac{P_j(x_{ij})}{2} ,
if categorical/ordinal
g_j(\mathbf{x}_i) = \frac{x_{ij} - min_j}{max_j - min_j} ,
if numerical
where F_j,
is the cumulative density function for (categorical/ordinal) dimension j and P_j,
its probability function.
Value
A list with the projection matrix, coordinates of the projected samples and a logical vector with the selected samples
References
Kandogan, E. (2001, August). Visualizing multi-dimensional clusters, trends, and outliers using star coordinates. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 107-116).
Lehmann, D. J., & Theisel, H. (2013). Orthographic star coordinates. IEEE Transactions on Visualization and Computer Graphics, 19(12), 2615-2624.
Rubio-Sánchez, M., & Sanchez, A. (2014). Axis calibration for improving data attribute estimation in star coordinates plots. IEEE transactions on visualization and computer graphics, 20(12), 2013-202
Matute, J., & Linsen, L. (2020, February). Hinted Star Coordinates for Mixed Data. In Computer Graphics Forum (Vol. 39, No. 1, pp. 117-133).
Examples
if (interactive()) {
library(RadialVisGadgets)
library(datasets)
data(iris)
StarCoordinates(iris, "Species")
}