rMSmix {MSmix}R Documentation

Random samples from a mixture of Mallows models with Spearman distance

Description

Draw random samples of full rankings from a mixture of Mallow models with Spearman distance.

Usage

rMSmix(
  sample_size = 1,
  n_items,
  n_clust = 1,
  rho = NULL,
  theta = NULL,
  weights = NULL,
  uniform = FALSE,
  mh = TRUE
)

Arguments

sample_size

Number of full rankings to be sampled. Defaults to 1.

n_items

Number of items.

n_clust

Number of mixture components. Defaults to 1.

rho

Integer G\timesn matrix with the component-specific consensus rankings in each row. Defaults to NULL, meaning that the consensus rankings are randomly generated according to the sampling scheme indicated by the uniform argument. See Details.

theta

Numeric vector of G non-negative component-specific precision parameters. Defaults to NULL, meaning that the concentrations are uniformly generated from an interval containing typical values for the precisions. See Details.

weights

Numeric vector of G positive mixture weights (normalization is not necessary). Defaults to NULL, meaning that the mixture weights are randomly generated according to the sampling scheme indicated by the uniform argument. See Details.

uniform

Logical: whether rho or weights have to be sampled uniformly on their support. When uniform = FALSE they are sampled, respectively, to ensure separation among mixture components and populated weights. Used when G>1 and either rho or weights are NULL (see Details). Defaults to FALSE.

mh

Logical: whether the samples must be drawn with the Metropolis-Hastings (MH) scheme implemented in the BayesMallows package, rather by direct sampling from the Mallows probability distribution. For n_items > 10, the MH is always applied to speed up the sampling procedure. Defaults to TRUE.

Details

When n_items > 10 or mh = TRUE, the random samples are obtained by using the Metropolis-Hastings algorithm, described in Vitelli et al. (2018) and implemented in the sample_mallows function of the package BayesMallows package.

When theta = NULL is not provided by the user, the concentration parameters are randomly generated from a uniform distribution on the interval (1/n^{2},3/n^{1.5}) of some typical values for the precisions.

When uniform = FALSE, the mixing weights are sampled from a symmetric Dirichlet distribution with shape parameters all equal to 2G, to favor populated and balanced clusters; the consensus parameters are sampled to favor well-separated clusters, i. e., at least at Spearman distance \frac{2}{G}\binom{n+1}{3} from each other.

Value

A list of the following named components:

samples

Integer N\timesn matrix with the sample_size simulated full rankings in each row.

rho

Integer G\timesn matrix with the component-specific consensus rankings used for the simulation in each row.

theta

Numeric vector of the G component-specific precision parameters used for the simulation.

weights

Numeric vector of the G mixture weights used for the simulation.

classification

Integer vector of the sample_size component membership labels.

References

Vitelli V, Sørensen Ø, Crispino M, Frigessi A and Arjas E (2018). Probabilistic Preference Learning with the Mallows Rank Model. Journal of Machine Learning Research, 18(158), pages 1–49, ISSN: 1532-4435, https://jmlr.org/papers/v18/15-481.html.

Sørensen Ø, Crispino M, Liu Q and Vitelli V (2020). BayesMallows: An R Package for the Bayesian Mallows Model. The R Journal, 12(1), pages 324–342, DOI: 10.32614/RJ-2020-026.

Chenyang Zhong (2021). Mallows permutation model with L1 and L2 distances I: hit and run algorithms and mixing times. arXiv: 2112.13456.

Examples


## Example 1. Drawing from a mixture with randomly generated parameters of separated clusters.
set.seed(12345)
rMSmix(sample_size = 50, n_items = 25, n_clust = 5)


## Example 2. Drawing from a mixture with uniformly generated parameters.
set.seed(12345)
rMSmix(sample_size = 100, n_items = 9, n_clust = 3, uniform = TRUE)


## Example 3.  Drawing from a mixture with customized parameters.
r_par <- rbind(1:5, c(4, 5, 2, 1, 3))
t_par <- c(0.01, 0.02)
w_par <- c(0.4, 0.6)
set.seed(12345)
rMSmix(sample_size = 50, n_items = 5, n_clust = 2, theta = t_par, rho = r_par, weights = w_par)


[Package MSmix version 1.0.1 Index]