MSmix-package {MSmix}R Documentation

Finite Mixtures of Mallows Models with Spearman Distance for Full and Partial Rankings

Description

The MSmix package provides functions to fit and analyze finite Mixtures of Mallows models with Spearman distance (a.k.a. \theta-model) for full and partial rankings with arbitrary missing positions. Inference is conducted within the maximum likelihood (ML) framework via EM algorithms. Estimation uncertainty is tackled via diverse versions of bootstrapping as well as via Hessian-based standard errors calculations.

Details

The Mallows model is one of the most popular and frequently applied parametric distributions to analyze rankings of a finite set of items. However, inference for this model is challenging due to the intractability of the normalizing constant, also referred to as partition function. The present package performs ML estimation (MLE) of the Mallows model with Spearman distance from full and partial rankings with arbitrary censoring patterns. Thanks to the novel approximation of the model normalizing constant introduced by Crispino, Mollica, Astuti and Tardella (2023), as well as the existence of a closed-form expression of the MLE of the consensus ranking, MSmix can address inference even for a large number of items (currently the upper limit is n\leq 170). The package also allows to account for unobserved sample heterogeneity through MLE of finite mixtures of Mallows models with Spearman distance via EM algorithms, in order to perform a model-based clustering of partial rankings into groups with similar preferences.

Computational efficiency is achieved with the use of a hybrid language, combining R and C++ code, and the possibility of parallel computation.

In addition to inferential techniques, the package provides various functions for data manipulation, simulation, descriptive summary and model selection.

Specific S3 classes and methods are also supplied to enhance the usability and foster exchange with other packages.

The suite of functions available in the MSmix package is composed of:

Ranking data manipulation

data_conversion

From rankings to orderings and vice versa.

data_censoring

Censoring of full rankings.

data_completion

Deterministic completion of partial rankings with full reference rankings.

data_augmentation

Generate all full rankings compatible with partial rankings.

Ranking data simulation

rMSmix

Random samples from finite mixtures of Mallows models with Spearman distance.

Ranking data description

data_description

Descriptive summaries for partial rankings.

Model estimation

fitMSmix

MLE of mixtures of Mallows models with Spearman distance via EM algorithms.

likMSmix

Likelihood evaluation for mixtures of Mallows models with Spearman distance.

Model selection

bicMSmix

BIC value for the fitted mixture of Mallows models with Spearman distance.

aicMSmix

AIC value for the fitted mixture of Mallows models with Spearman distance.

Estimation uncertainty

bootstrapMSmix

Bootstrap confidence intervals for mixtures of Mallows models with Spearman distance.

confintMSmix

Hessian-based confidence intervals for mixtures of Mallows models with Spearman distance.

Spearman distance utilities

spear_dist

Spearman distance computation for full rankings.

spear_dist_distr

Spearman distance distribution under the uniform (null) model.

partition_fun_spear

Partition function of the Mallows model with Spearman distance.

expected_spear_dist

Expected Spearman distance under the Mallows model with Spearman distance.

var_spear_dist

Variance of the Spearman distance under the Mallows model with Spearman distance.

S3 class methods

plot.emMSmix

Plot the MLEs of mixtures of Mallows models with Spearman distance.

summary.emMSmix

Summary of the MLEs of mixtures of Mallows models with Spearman distance.

plot.data_descr

Plot the descriptive statistics for partial rankings.

plot.dist

Plot the Spearman distance matrix for full rankings.

Datasets

ranks_antifragility

Antifragility features of innovative startups (full rankings).

ranks_beers

Beer preference data (partial missing at random rankings with covariate).

ranks_read_genres

Reading preference data (partial top-5 rankings with covariates).

ranks_sports

Sport preferences and habits (full rankings with covariates).

Some quantities frequently recalled in the manual are the following:

N

Sample size.

n

Number of possible items.

G

Number of mixture components.

Data must be supplied as an integer N\timesn matrix with partial rankings in each row and missing positions denoted as NA (rank = 1 indicates the most-liked item). Partial sequences with a single missing entry are automatically filled in, as they correspond to full rankings. In the present setting, ties are not allowed.

Author(s)

Cristina Mollica, Marta Crispino, Lucia Modugno and Luca Tardella

Maintainer: Cristina Mollica <cristina.mollica@uniroma1.it>

References

Crispino M, Mollica C, Astuti V and Tardella L (2023). Efficient and accurate inference for mixtures of Mallows models with Spearman distance. Statistics and Computing, 33(98), DOI: 10.1007/s11222-023-10266-8.

Crispino M, Mollica C, Modugno L, Casadio Tarabusi E, and Tardella L (2024+). MSmix: An R Package for clustering partial rankings via mixtures of Mallows models with Spearman distance. (submitted).


[Package MSmix version 1.0.1 Index]