fitMSmix {MSmix} | R Documentation |
MLE of mixtures of Mallows models with Spearman distance via EM algorithms
Description
Perform the MLE of mixtures of Mallows model with Spearman distance on full and partial rankings via EM algorithms. Partial rankings with arbitrary missing positions are supported.
print
method for class "emMSmix"
.
Usage
fitMSmix(
rankings,
n_clust = 1,
n_start = 1,
n_iter = 200,
mc_em = FALSE,
eps = 10^(-6),
init = list(list(rho = NULL, theta = NULL, weights = NULL))[rep(1, n_start)],
plot_log_lik = FALSE,
comp_log_lik_part = FALSE,
plot_log_lik_part = FALSE,
parallel = FALSE,
theta_max = 3,
theta_tol = 1e-05,
theta_tune = 1,
subset = NULL,
item_names = NULL
)
## S3 method for class 'emMSmix'
print(x, ...)
Arguments
rankings |
Integer |
n_clust |
Number of mixture components. Defaults to 1. |
n_start |
Number of starting points. Defaults to 1. |
n_iter |
Maximum number of EM iterations. Defaults to 200. |
mc_em |
Logical: whether the Monte Carlo EM algorithm must be used for MLE on partial rankings completion, see Details. Ignored when |
eps |
Positive tolerance value for the convergence of the EM algorithm. Defaults to |
init |
List of |
plot_log_lik |
Logical: whether the iterative log-likelihood values (based on full or augmented rankings) must be plotted. Defaults to |
comp_log_lik_part |
Logical: whether the maximized observed-data log-likelihood value (based on partial rankings) must be returned. Ignored when |
plot_log_lik_part |
Logical: whether the iterative observed-data log-likelihood values (based on partial rankings) must be plotted. Ignored when |
parallel |
Logical: whether parallelization over multiple initializations must be used. Defaults to |
theta_max |
Positive upper bound for the precision parameters. Defaults to 3. |
theta_tol |
Positive convergence tolerance for the Mstep on theta. Defaults to |
theta_tune |
Positive tuning constant affecting the precision parameters in the Monte Carlo step. Ignored when |
subset |
Optional logical or integer vector specifying the subset of observations, i.e. rows of the |
item_names |
Character vector for the names of the items. Defaults to |
x |
An object of class |
... |
Further arguments passed to or from other methods (not used). |
Details
The EM algorithms are launched from n_start
initializations and the best solution in terms of maximized
log-likelihood value (based on full or augmented rankings) is returned.
When mc_em = FALSE
, the scheme introduced by Crispino et al. (2023) is performed, where partial
rankings are augmented with all compatible full rankings. This type of data augmentation is
supported up to 10 missing positions in the partial rankings.
When mc_em = TRUE
, the - computationally more efficient - Monte Carlo EM algorithm
introduced by Crispino et al. (2024+) is implemented. In the case of a large number
of censored positions and sample sizes, the mc_em = TRUE
must be preferred.
Regardless of the fitting method adopted for inference on partial rankings, note that
setting the argument comp_log_lik_part = TRUE
for the computation of the
observed-data log-likelihood values (based on partial rankings)
can slow down the procedure in the case of a large number of censored positions and sample sizes.
Value
An object of class "emMSmix"
, namely a list with the following named components:
mod
List of named objects describing the best fitted model in terms of maximized log-likelihood over the
n_start
initializations. See Details.max_log_lik
Maximized log-likelihood values for each initialization.
partial_data
Logical: whether the dataset includes some partially-ranked sequences.
convergence
Binary convergence indicators of the EM algorithm for each initialization: 1 = convergence has been achieved, 0 = otherwise.
record
Best log-likelihood values sequentially achieved over the
n_start
initializations.em_settings
List of settings used to fit the model.
call
The matched call.
The mod
sublist contains the following named objects:
rho
Integer
G
\times
n
matrix with the MLEs of the component-specific consensus rankings in each row.theta
Numeric vector with the MLEs of the
G
component-specific precision parameters.weights
Numeric vector with the MLEs of the
G
mixture weights.z_hat
Numeric
N
\times
G
matrix of the estimated posterior component membership probabilities. Returned whenn_clust > 1
, otherwiseNULL
.map_classification
Integer vector of
N
mixture component memberships based on the MAP allocation from thez_hat
matrix. Returned whenn_clust > 1
, otherwiseNULL
.log_lik
Numeric vector of the log-likelihood values (based on full or augmented rankings) at each iteration.
best_log_lik
Maximized log-likelihood value (based on full or augmented rankings) of the fitted model.
bic
BIC value of the fitted model based on
best_log_lik
.log_lik_part
Numeric vector of the observed-data log-likelihood values (based on partial rankings) at each iteration. Returned when
rankings
contains some partial sequences that can be completed withdata_augmentation
andplot_log_lik_part = TRUE
, otherwiseNULL
. See Details.best_log_lik_part
Maximized observed-data log-likelihood value (based on partial rankings) of the fitted model. Returned when
rankings
contains some partial sequences that can be completed withdata_augmentation
, otherwiseNULL
. See Details.bic_part
BIC value of the fitted model based on
best_log_lik_part
. Returned whenrankings
contains some partial sequences that can be completed withdata_augmentation
, otherwiseNULL
. See Details.conv
Binary convergence indicator of the best fitted model: 1 = convergence has been achieved, 0 = otherwise.
augmented_rankings
Integer
N
\times
n
matrix with rankings completed through the Monte Carlo step in each row. Returned whenrankings
contains some partial sequences andmc_em = TRUE
, otherwiseNULL
.
References
Crispino M, Mollica C and Modugno L (2024+). MSmix: An R Package for clustering partial rankings via mixtures of Mallows Models with Spearman distance. (submitted)
Crispino M, Mollica C, Astuti V and Tardella L (2023). Efficient and accurate inference for mixtures of Mallows models with Spearman distance. Statistics and Computing, 33(98), DOI: 10.1007/s11222-023-10266-8.
Sørensen Ø, Crispino M, Liu Q and Vitelli V (2020). BayesMallows: An R Package for the Bayesian Mallows Model. The R Journal, 12(1), pages 324–342, DOI: 10.32614/RJ-2020-026.
Beckett LA (1993). Maximum likelihood estimation in Mallows’s model using partially ranked data. In Probability models and statistical analyses for ranking data, pages 92–107. Springer New York.
See Also
Examples
## Example 1. Fit the 3-component mixture of Mallow models with Spearman distance
## to the Antifragility dataset.
r_antifrag <- ranks_antifragility[, 1:7]
set.seed(123)
mms_fit <- fitMSmix(rankings = r_antifrag, n_clust = 3, n_start = 10)
mms_fit$mod$rho; mms_fit$mod$theta; mms_fit$mod$weights
## Example 2. Fit the Mallow model with Spearman distance
## to simulated partial rankings through data augmentation.
rank_data <- rbind(c(NA, 4, NA, 1, NA), c(NA, NA, NA, NA, 1), c(2, NA, 1, NA, 3),
c(4, 2, 3, 5, 1), c(NA, 4, 1, 3, 2))
mms_fit <- fitMSmix(rankings = rank_data, n_start = 10)
mms_fit$mod$rho; mms_fit$mod$theta
## Example 3. Fit the Mallow model with Spearman distance
## to the Reading genres dataset through Monte Carlo EM.
top5_read <- ranks_read_genres[, 1:11]
mms_fit <- fitMSmix(rankings = top5_read, n_start = 10, mc_em = TRUE)
mms_fit$mod$rho; mms_fit$mod$theta