data_censoring {MSmix}R Documentation

Censoring of full rankings

Description

Convert full rankings into either top-k or MAR (missing at random) partial rankings.

Usage

data_censoring(
  rankings,
  type = "topk",
  nranked = NULL,
  probcens = rep(1, ncol(rankings) - 1)
)

Arguments

rankings

Integer N\timesn matrix with full rankings in each row.

type

Character indicating which censoring process must be used. Options are: "topk" and "mar". Defaults to "topk".

nranked

Integer vector of length N with the desired number of positions to be retained in each partial sequence after censoring. If not supplied (NULL), the number of positions are randomly generated according to the probabilities in the probcens argument. Defaults to NULL.

probcens

Numeric vector of the (n-1) probabilities for the random generation of the number of positions to be retained in each partial sequence after censoring (normalization is not necessary). Used only if nranked argument is NULL (see Details). Default is equal probabilities.

Details

Both forms of partial rankings can be obtained into two ways: (i) by specifying, in the nranked argument, the number of positions to be retained in each partial ranking; (ii) by setting nranked = NULL (default) and specifying, in the probcens argument, the probabilities of retaining respectively 1, 2, ..., (n-1) positions in the partial rankings (recall that a partial sequence with (n-1) observed entries corresponds to a full ranking).

In the censoring process of full rankings into MAR partial sequences, the positions to be retained are uniformly generated.

Value

A list of two named objects:

part_rankings

Integer N\timesn matrix with partial (censored) rankings in each row. Missing positions must be coded as NA.

nranked

Integer vector of length N with the actual number of items ranked in each partial sequence after censoring.

Examples


## Example 1. Censoring the Antifragility dataset into partial top rankings
# Top-3 censoring (assigned number of top positions to be retained)
data_censoring(ranks_antifragility, type = "topk",
               nranked = rep(3,nrow(ranks_antifragility)))
# Random top-k censoring with assigned probabilities
set.seed(12345)
data_censoring(ranks_antifragility, type = "topk",
               probcens = 1:(ncol(ranks_antifragility)-1))
## Example 2. Simulate full rankings from a basic Mallows model with Spearman distance
n_items <- 10
N <- 100
set.seed(12345)
rankings <- rMSmix(sample_size = N, n_items = n_items)$samples
# MAR censoring with assigned number of positions to be retained
set.seed(12345)
nranked <- round(runif(N,0.5,1)*n_items)
set.seed(12345)
mar_ranks1 <- data_censoring(rankings, type = "mar", nranked = nranked)
mar_ranks1
identical(mar_ranks1$nranked, nranked)
# MAR censoring with assigned probabilities
set.seed(12345)
probcens <- runif(n_items-1, 0, 0.5)
set.seed(12345)
mar_ranks2 <- data_censoring(rankings, type = "mar", probcens = probcens)
mar_ranks2
prop.table(table(mar_ranks2$nranked))
round(prop.table(probcens), 2)


[Package MSmix version 1.0.1 Index]