MSmix-package {MSmix} | R Documentation |
Finite Mixtures of Mallows Models with Spearman Distance for Full and Partial Rankings
Description
The MSmix package provides functions to fit and analyze finite
Mixtures of Mallows models with Spearman distance (a.k.a. \theta
-model)
for full and partial rankings with arbitrary missing positions.
Inference is conducted within the maximum likelihood (ML) framework via EM algorithms.
Estimation uncertainty is tackled via diverse versions of bootstrapping
as well as via Hessian-based standard errors calculations.
Details
The Mallows model is one of the most popular and frequently applied parametric distributions to analyze rankings of a finite set of items. However, inference for this model is challenging due to the intractability of the normalizing constant, also referred to as partition function. The present package performs ML estimation (MLE) of the Mallows model with Spearman distance from full and partial rankings with arbitrary censoring patterns. Thanks to the novel approximation of the model normalizing constant introduced by Crispino, Mollica, Astuti and Tardella (2023), as well as the existence of a closed-form expression of the MLE of the consensus ranking, MSmix can address inference even for a large number of items. The package also allows to account for unobserved sample heterogeneity through MLE of finite mixtures of Mallows models with Spearman distance via EM algorithms, in order to perform a model-based clustering of partial rankings into groups with similar preferences.
Computational efficiency is achieved with the use of a hybrid language, combining R
and C++
code,
and the possibility of parallel computation.
In addition to inferential techniques, the package provides various functions for data manipulation, simulation, descriptive summary and model selection.
Specific S3 classes and methods are also supplied to enhance the usability and foster exchange with other packages.
The suite of functions available in the MSmix package is composed of:
Ranking data manipulation
data_conversion
From rankings to orderings and vice versa.
data_censoring
Censoring of full rankings.
data_completion
Deterministic completion of partial rankings with full reference rankings.
data_augmentation
Generate all full rankings compatible with partial rankings.
Ranking data simulation
rMSmix
Random samples from finite mixtures of Mallows models with Spearman distance.
Ranking data description
data_description
Descriptive summaries for partial rankings.
Model estimation
fitMSmix
MLE of mixtures of Mallows models with Spearman distance via EM algorithms.
likMSmix
Likelihood evaluation for mixtures of Mallows models with Spearman distance.
Model selection
bicMSmix
BIC value for the fitted mixture of Mallows models with Spearman distance.
aicMSmix
AIC value for the fitted mixture of Mallows models with Spearman distance.
Estimation uncertainty
bootstrapMSmix
Bootstrap confidence intervals for mixtures of Mallows models with Spearman distance.
confintMSmix
Hessian-based confidence intervals for mixtures of Mallows models with Spearman distance.
Spearman distance utilities
spear_dist
Spearman distance computation for full rankings.
spear_dist_distr
Spearman distance distribution under the uniform (null) model.
partition_fun_spear
Partition function of the Mallows model with Spearman distance.
expected_spear_dist
Expected Spearman distance under the Mallows model with Spearman distance.
var_spear_dist
Variance of the Spearman distance under the Mallows model with Spearman distance.
S3 class methods
print.bootMSmix
Print the bootstrap confidence intervals of mixtures of Mallows models with Spearman distance.
print.data_descr
Print the descriptive statistics for partial rankings.
print.emMSmix
Print the MLEs of mixtures of Mallows models with Spearman distance.
print.summary.emMSmix
Print the summary of the MLEs of mixtures of Mallows models with Spearman distance.
plot.bootMSmix
Plot the bootstrap confidence intervals of mixtures of Mallows models with Spearman distance.
plot.data_descr
Plot the descriptive statistics for partial rankings.
plot.dist
Plot the Spearman distance matrix for full rankings.
plot.emMSmix
Plot the MLEs of mixtures of Mallows models with Spearman distance.
summary.emMSmix
Summary of the MLEs of mixtures of Mallows models with Spearman distance.
Datasets
ranks_antifragility
Antifragility features of innovative startups (full rankings with covariates).
ranks_horror
Arkham Horror data (full rankings).
ranks_beers
Beer preference data (partial missing at random rankings with covariate).
ranks_read_genres
Reading preference data (partial top-5 rankings with covariates).
ranks_sports
Sport preferences and habits (full rankings with covariates).
Some quantities frequently recalled in the manual are the following:
N
Sample size.
n
Number of possible items.
G
Number of mixture components.
Data must be supplied as an integer N
\times
n
matrix with partial rankings in each row and missing positions denoted as NA (rank = 1 indicates the
most-liked item). Partial sequences with a single missing entry are
automatically filled in, as they correspond to full rankings. In the present setting, ties are not allowed.
Author(s)
Cristina Mollica, Marta Crispino, Lucia Modugno and Luca Tardella
Maintainer: Cristina Mollica <cristina.mollica@uniroma1.it>
References
Crispino M, Mollica C, Astuti V and Tardella L (2023). Efficient and accurate inference for mixtures of Mallows models with Spearman distance. Statistics and Computing, 33(98), DOI: 10.1007/s11222-023-10266-8.
Crispino M, Mollica C, Modugno L, Casadio Tarabusi E, and Tardella L (2024+). MSmix: An R Package for clustering partial rankings via mixtures of Mallows models with Spearman distance. (submitted).