R: Finite Mixtures of Mallows Models with Spearman Distance for...

MSmix-package {MSmix}

R Documentation

Finite Mixtures of Mallows Models with Spearman Distance for Full and Partial Rankings

Description

The MSmix package provides functions to fit and analyze finite Mixtures of Mallows models with Spearman distance (a.k.a. \theta-model) for full and partial rankings with arbitrary missing positions. Inference is conducted within the maximum likelihood (ML) framework via EM algorithms. Estimation uncertainty is tackled via diverse versions of bootstrapping as well as via Hessian-based standard errors calculations.

Details

The Mallows model is one of the most popular and frequently applied parametric distributions to analyze rankings of a finite set of items. However, inference for this model is challenging due to the intractability of the normalizing constant, also referred to as partition function. The present package performs ML estimation (MLE) of the Mallows model with Spearman distance from full and partial rankings with arbitrary censoring patterns. Thanks to the novel approximation of the model normalizing constant introduced by Crispino, Mollica, Astuti and Tardella (2023), as well as the existence of a closed-form expression of the MLE of the consensus ranking, MSmix can address inference even for a large number of items. The package also allows to account for unobserved sample heterogeneity through MLE of finite mixtures of Mallows models with Spearman distance via EM algorithms, in order to perform a model-based clustering of partial rankings into groups with similar preferences.

Computational efficiency is achieved with the use of a hybrid language, combining R and C++ code, and the possibility of parallel computation.

In addition to inferential techniques, the package provides various functions for data manipulation, simulation, descriptive summary and model selection.

Specific S3 classes and methods are also supplied to enhance the usability and foster exchange with other packages.

The suite of functions available in the MSmix package is composed of:

Ranking data manipulation

data_conversion: From rankings to orderings and vice versa.
data_censoring: Censoring of full rankings.
data_completion: Deterministic completion of partial rankings with full reference rankings.
data_augmentation: Generate all full rankings compatible with partial rankings.

Ranking data simulation

rMSmix: Random samples from finite mixtures of Mallows models with Spearman distance.

Ranking data description

data_description: Descriptive summaries for partial rankings.

Model estimation

fitMSmix: MLE of mixtures of Mallows models with Spearman distance via EM algorithms.
likMSmix: Likelihood evaluation for mixtures of Mallows models with Spearman distance.

Model selection

bicMSmix: BIC value for the fitted mixture of Mallows models with Spearman distance.
aicMSmix: AIC value for the fitted mixture of Mallows models with Spearman distance.

Estimation uncertainty

bootstrapMSmix: Bootstrap confidence intervals for mixtures of Mallows models with Spearman distance.
confintMSmix: Hessian-based confidence intervals for mixtures of Mallows models with Spearman distance.

Spearman distance utilities

spear_dist: Spearman distance computation for full rankings.
spear_dist_distr: Spearman distance distribution under the uniform (null) model.
partition_fun_spear: Partition function of the Mallows model with Spearman distance.
expected_spear_dist: Expected Spearman distance under the Mallows model with Spearman distance.
var_spear_dist: Variance of the Spearman distance under the Mallows model with Spearman distance.

S3 class methods

print.bootMSmix: Print the bootstrap confidence intervals of mixtures of Mallows models with Spearman distance.
print.data_descr: Print the descriptive statistics for partial rankings.
print.emMSmix: Print the MLEs of mixtures of Mallows models with Spearman distance.
print.summary.emMSmix: Print the summary of the MLEs of mixtures of Mallows models with Spearman distance.
plot.bootMSmix: Plot the bootstrap confidence intervals of mixtures of Mallows models with Spearman distance.
plot.data_descr: Plot the descriptive statistics for partial rankings.
plot.dist: Plot the Spearman distance matrix for full rankings.
plot.emMSmix: Plot the MLEs of mixtures of Mallows models with Spearman distance.
summary.emMSmix: Summary of the MLEs of mixtures of Mallows models with Spearman distance.

Datasets

ranks_antifragility: Antifragility features of innovative startups (full rankings with covariates).
ranks_horror: Arkham Horror data (full rankings).
ranks_beers: Beer preference data (partial missing at random rankings with covariate).
ranks_read_genres: Reading preference data (partial top-5 rankings with covariates).
ranks_sports: Sport preferences and habits (full rankings with covariates).

Some quantities frequently recalled in the manual are the following:

N: Sample size.
n: Number of possible items.
G: Number of mixture components.

Data must be supplied as an integer N\timesn matrix with partial rankings in each row and missing positions denoted as NA (rank = 1 indicates the most-liked item). Partial sequences with a single missing entry are automatically filled in, as they correspond to full rankings. In the present setting, ties are not allowed.

Author(s)

Cristina Mollica, Marta Crispino, Lucia Modugno and Luca Tardella

Maintainer: Cristina Mollica <cristina.mollica@uniroma1.it>

References

Crispino M, Mollica C, Astuti V and Tardella L (2023). Efficient and accurate inference for mixtures of Mallows models with Spearman distance. Statistics and Computing, 33(98), DOI: 10.1007/s11222-023-10266-8.

Crispino M, Mollica C, Modugno L, Casadio Tarabusi E, and Tardella L (2024+). MSmix: An R Package for clustering partial rankings via mixtures of Mallows models with Spearman distance. (submitted).

[Package MSmix version 1.0.2 Index]