R: Censoring of full rankings

data_censoring {MSmix}

R Documentation

Censoring of full rankings

Description

Convert full rankings into either top-k or MAR (missing at random) partial rankings.

Usage

data_censoring(
  rankings,
  type = "topk",
  nranked = NULL,
  probs = rep(1, ncol(rankings) - 1)
)

Arguments

`rankings`	Integer `N\timesn` matrix or data frame with full rankings in each row.
`type`	Character indicating which censoring process must be used. Options are: `"topk"` and `"mar"`. Defaults to `"topk"`.
`nranked`	Integer vector of length `N` with the desired number of positions to be retained in each partial sequence after censoring. If not supplied (`NULL`), the number of positions are randomly generated according to the probabilities in the `probs` argument. Defaults to `NULL`.
`probs`	Numeric vector of the `(n-1)` probabilities for the random generation of the number of positions to be retained in each partial sequence after censoring (normalization is not necessary). Used only if `nranked` argument is `NULL` (see Details). Default is equal probabilities.

Details

Both forms of partial rankings can be obtained into two ways: (i) by specifying, in the nranked argument, the number of positions to be retained in each partial ranking; (ii) by setting nranked = NULL (default) and specifying, in the probs argument, the probabilities of retaining respectively 1, 2, ..., (n-1) positions in the partial rankings (recall that a partial sequence with (n-1) observed entries corresponds to a full ranking).

In the censoring process of full rankings into MAR partial sequences, the positions to be retained are uniformly generated.

Value

A list of two named objects:

part_rankings: Integer N\timesn matrix with partial (censored) rankings in each row. Missing positions must be coded as NA.
nranked: Integer vector of length N with the actual number of items ranked in each partial sequence after censoring.

Examples


## Example 1. Censoring the Antifragility dataset into partial top rankings
# Top-3 censoring (assigned number of top positions to be retained)
n <- 7
r_antifrag <- ranks_antifragility[, 1:n]
data_censoring(r_antifrag, type = "topk", nranked = rep(3,nrow(r_antifrag)))
# Random top-k censoring with assigned probabilities
set.seed(12345)
data_censoring(r_antifrag, type = "topk", probs = 1:(n-1))

## Example 2. Simulate full rankings from a basic Mallows model with Spearman distance
n <- 10
N <- 100
set.seed(12345)
rankings <- rMSmix(sample_size = N, n_items = n)$samples
# MAR censoring with assigned number of positions to be retained
set.seed(12345)
nranked <- round(runif(N,0.5,1)*n)
set.seed(12345)
mar_ranks1 <- data_censoring(rankings, type = "mar", nranked = nranked)
mar_ranks1
identical(mar_ranks1$nranked, nranked)
# MAR censoring with assigned probabilities
set.seed(12345)
probs <- runif(n-1, 0, 0.5)
set.seed(12345)
mar_ranks2 <- data_censoring(rankings, type = "mar", probs = probs)
mar_ranks2
prop.table(table(mar_ranks2$nranked))
round(prop.table(probs), 2)

[Package MSmix version 1.0.2 Index]