R: Multidimensional scaling of discrete probability...

mdsdd {dad}

R Documentation

Multidimensional scaling of discrete probability distributions

Description

Applies the multidimensional scaling (MDS) method to discrete probability distributions in order to describe T groups of individuals on which are observed q categorical variables. It returns an object of class mdsdd. It applies cmdscale to the distance matrix between the T distributions.

Usage

mdsdd(xf, group.name = "group", distance = c("l1", "l2", "chisqsym", "hellinger",
    "jeffreys", "jensen", "lp"), nb.factors = 3, nb.values = 10, association = c("cramer",
    "tschuprow", "pearson", "phi"), sub.title = "", plot.eigen = TRUE,
    plot.score = FALSE, nscore = 1:3, filename = NULL, add = TRUE, p)

Arguments

`xf`	object of class `folder`, list of arrays (or tables) or data frame. If it is a folder, its elements are data frames with `q` columns (considered as factors). The `t^{th}` element (`t = 1, \ldots, T`) matches with the `t^{th}` group. If it is a data frame, the columns with name given by the `group.name` argument is a factor giving the groups. The other columns are all considered as factors. If it is a list of arrays (or tables), the `t^{th}` element (`t = 1, \ldots, T`) is the table of the joint frequency distribution of `q` variables within the `t^{th}` group. The frequency distribution is expressed with relative or absolute frequencies. These arrays have the same shape. Each array (or table) `xf[[i]]` has: the same dimension(s). If `q = 1` (univariate), `dim(xf[[i]])` is an integer. If `q > 1` (multivariate), `dim(xf[[i]])` is an integer vector of length `q`. the same dimension names `dimnames(xf[[i]])` (is non `NULL`). These dimnames are the names of the variables. The elements of the arrays are non-negative numbers (if they are not, there is an error).
`group.name`	string. Name of the grouping variable. Default: `groupname = "group"`.
`distance`	The distance or divergence used to compute the distance matrix between the discrete distributions (see Details). It can be: `"l1"` (default) the `L^p` distance with `p = 1` `"l2"` the `L^p` distance with `p = 2` `"chisqsym"` the symmetric Chi-squared distance `"hellinger"` the Hellinger metric (Matusita distance) `"jeffreys"` Jeffreys distance (symmetrised Kullback-Leibler divergence) `"jensen"` the Jensen-Shannon distance `"lp"` the `L^p` distance with `p` given by the argument `p` of the function.
`nb.factors`	numeric. Number of returned principal coordinates (default `nb.factors = 3`). This number must be less than `T - 1`. Warning: The `plot.mdsdd` and `interpret.mdsdd` functions cannot take into account more than `nb.factors` principal factors.
`nb.values`	numeric. Number of returned eigenvalues (default `nb.values = 10`).
`association`	The association measure between two discrete distributions to be used (see Details). It can be: `"cramer"` (default) Cramer's V (see `cramer.folder`). `"tschuprow"` Tschuprow's T (`tschuprow.folder`). `"pearson"` Pearson's contingency coefficient (`pearson.folder`). `"phi"` phi (`phi.folder`).
`sub.title`	string. Subtitle for the graphs (default `NULL`).
`plot.eigen`	logical. If `TRUE` (default), the barplot of the eigenvalues is plotted.
`plot.score`	logical. If `TRUE`, the graphs of new coordinates are plotted. A new graphic device is opened for each pair of coordinates defined by `nscore` argument.
`nscore`	numeric vector. If `plot.score = TRUE`, the numbers of the principal coordinates which are plotted. By default, `nscore = 1:3`. Its components cannot be greater than `nb.factors`.
`filename`	string. Name of the file in which the results are saved. By default (`filename = NULL`) they are not saved.
`add`	logical indicating if an additive constant should be computed and added to the non diagonal dissimilarities such that the modified dissimilarities are Euclidean (default `TRUE`; see `add` argument of `cmdscale`).
`p`	integer. Optional. When `distance = "lp"` (`L^p` distance with `p>2`), `p` is the parameter of the distance.

Details

If a folder is given as argument, the T discrete probability distributions f_t corresponding to the T groups of individuals are estimated from observations. Then the distances/dissimilarities between the estimated distributions are computed, using the distance or divergence defined by the distance argument:

If the distance is "l1", "l2" or "lp", the distances are computed by the function matddlppar. Otherwise, it can be computed by matddchisqsympar ("chisqsym"), matddhellingerpar ("hellinger"), matddjeffreyspar ("jeffreys") or matddjensenpar ("jensen").

The association measures are computed accordingly to the value of the parameter associationThe computation uses the corresponding function of the package DescTools (see Assocs). Notice that an association measure between a constant variable with and other variable is set to zero. The association measure between each variable with itself is not computed and the diagonal of the returned association matrices is set to NA.

Value

Returns an object of class mdsdd, that is a list including:

`inertia`	data frame of the eigenvalues and the percentages of their sum.
`scores`	data frame of the coordinates along the `nb.factors` first principal coordinates.
`jointp`	list of arrays. The joint probability distribution for each group.
`margins`	list of two data frames giving respectively: The probability distribution of each variable for each group. Each column of the data frame corresponds to one level of one categorical variable and contains the probabilities of this level in each group. The joint probability distribution of each pair of variables for each group. Each column of the data frame corresponds to one pair of levels of two categorical variables (one level per variable) and contains the probabilities of this pair of levels in each group.
`associations`	list of `T` matrices. Each matrix corresponds to a group and gives the pairwise association measures between the `q` categorical variables.

Author(s)

Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard

References

Cox, T.F., Cox, M.A.A. (2001). Multidimensional Scaling, second ed. Chapman & Hall/CRC.

Saporta, G. (2006). Probabilit\'es, Analyse des donn\'ees et Statistique. Editions Technip, Paris.

Examples

# Example 1 with a folder (10 groups) of 3 factors 
# obtained by converting numeric variables
data(roses)
xr = roses[,c("Sha", "Den", "Sym", "rose")]
xf = as.folder(xr, groups = "rose")
xf = cut(xf, breaks = list(c(0, 5, 7, 10), c(0, 4, 6, 10), c(0, 6, 8, 10)), cutcol = 1:3)
af = mdsdd(xf)
print(af)
print(af$jointp)
print(af$margins[[1]]) # equivalent to print(af$margins$margin1) 
print(af$margins[[2]])
print(af$associations)

# Example 2 with a data frame obtained by converting numeric variables
data(roses)
xr = roses[,c("Sha", "Den", "Sym", "rose")]
xr = cut(xr, breaks = list(c(0, 5, 7, 10), c(0, 4, 6, 10), c(0, 6, 8, 10)), cutcol = 1:3)
ar = mdsdd(xr, group.name = "rose")
print(ar)
print(ar$jointp)
print(ar$margins[[1]]) # equivalent to print(ar$margins$margin1) 
print(ar$margins[[2]])
print(ar$associations)

# Example 3 with a list of 7 arrays
data(dspg)
xl = dspg
mdsdd(xl)

[Package dad version 4.1.2 Index]