data_description {MSmix}R Documentation

Descriptive summaries for partial rankings

Description

Compute various data summaries for a partial ranking dataset. Differently from existing analogous functions supplied by other R packages, data_description supports partial observations with arbitrary patterns of censoring.

print method for class "data_descr".

Usage

data_description(
  rankings,
  marg = TRUE,
  borda_ord = FALSE,
  paired_comp = TRUE,
  subset = NULL,
  item_names = NULL
)

## S3 method for class 'data_descr'
print(x, ...)

Arguments

rankings

Integer N\timesn matrix or data frame with partial rankings in each row. Missing positions must be coded as NA.

marg

Logical: whether the first-order marginals have to be computed. Defaults to TRUE.

borda_ord

Logical: whether, in the summary statistics, the items must be ordered according to the Borda ranking (i.e., mean rank vector). Defaults to FALSE.

paired_comp

Logical: whether the pairwise comparison matrix has to be computed. Defaults to TRUE.

subset

Optional logical or integer vector specifying the subset of observations, i.e. rows of rankings, to be considered. Missing values are taken as FALSE.

item_names

Character vector for the names of the items. Defaults to NULL, meaning that colnames(rankings) is used and, if not available, item_names is set equal to "Item1","Item2",....

x

An object of class "data_descr" returned by data_description.

...

Further arguments passed to or from other methods (not used).

Details

The implementation of data_description is similar to that of rank_summaries from the PLMIX package. Differently from the latter, data_description works with any kind of partial rankings (not only top rankings) and allows to summarize subsamples thanks to the additional subset argument.

The Borda ranking, obtained from the ordering of the mean rank vector, corresponds to the MLE of the consensus ranking of the Mallow model with Spearman distance. If mean_rank contains some NAs, the corresponding items occupy the bottom positions in the borda_ordering according to the order they appear in item_names.

Value

An object of class "data_descr", which is a list with the following named components:

n_ranked

Integer vector of length N with the number of items ranked in each partial sequence.

n_ranked_distr

Frequency distribution of the n_ranked vector.

n_ranks_by_item

Integer 3\timesn matrix with the number of times that each item has been ranked or not. The last row contains the total by column, i.e. the sample size N.

mean_rank

Mean rank vector.

borda_ordering

Character vector corresponding to the Borda ordering. This is obtained from the ranking of the mean rank vector.

marginals

Integer n\timesn matrix of the first-order marginals in each column: the (j,i)-th entry indicates the number of times that item i is ranked in position j.

pc

Integer n\timesn pairwise comparison matrix: the (i,i')-th entry indicates the number of times that item i is preferred to item i'.

rankings

When borda_ord = TRUE, an integer N\timesn matrix corresponding to rankings with columns rearranged according to the Borda ordering, otherwise the input rankings.

References

Mollica C and Tardella L (2020). PLMIX: An R package for modelling and clustering partially ranked data. Journal of Statistical Computation and Simulation, 90(5), pages 925–959, ISSN: 0094-9655, DOI: 10.1080/00949655.2020.1711909.

Marden JI (1995). Analyzing and modeling rank data. Monographs on Statistics and Applied Probability (64). Chapman & Hall, ISSN: 0-412-99521-2. London.

See Also

plot.data_descr

Examples


## Example 1. Sample statistics for the Antifragility dataset.
r_antifrag <- ranks_antifragility[, 1:7]
descr <- data_description(rankings = r_antifrag)
descr

## Example 2. Sample statistics for the Sports dataset.
r_sports <- ranks_sports[, 1:8]
descr <- data_description(rankings = r_sports, borda_ord = TRUE)
descr

## Example 3. Sample statistics for the Sports dataset by gender.
r_sports <- ranks_sports[, 1:8]
desc_f <- data_description(rankings = r_sports, subset = (ranks_sports$Gender == "Female"))
desc_m <- data_description(rankings = r_sports, subset = (ranks_sports$Gender == "Male"))
desc_f
desc_m


[Package MSmix version 1.0.2 Index]