R: RankData Class

RankData-class {rankdist}

R Documentation

RankData Class

Description

A S4 class to represent ranking data

It is well understood that the ranking representation and ordering representation of ranking data can easily be confused. I thus use a S4 class to store all the information about the ranking data. This can avoid unnecessary confusion.

Details

It is possible to store both complete and top-q rankings in the same RankData object. Three slots topq, subobs, and q_ind are introduced for this purpose. Note that there is generally no need to specify these slots if your data set only contains a single "q" level (for example all data are top-10 rankings). The "q" level for complete ranking should be nobj-1. Moreover, if the rankings are organized in chunks of increasing "q" levels (for example, top-2 rankings followed by top-3 rankings followed by top-5 rankings etc.), then slots subobs, and q_ind can also be inferred correctly by the initializer. Therefore it is highly recommender that you organise the ranking matrix in this way and utilize the initializer.

Slots

nobj: The number of ranked objects. If not provided, it will be inferred as the maximum ranking in the data set. As a result, it must be provided if the data is top-q ranking.
nobs: the number of observations. No need to be provided during initialization since it must be equal to the sum of slot count.
ndistinct: the number of distinct rankings. No need to be provided during initialization since it must be equal to the number of rows of slot ranking.
ranking: a matrix that stores the ranking representation of distinct rankings. Each row contains one ranking. For top-q ranking, all unobserved objects have ranking q+1.
count: the number of observations for each distinct ranking corresponding to each row of ranking.
topq: a numeric vector to store top-q ranking information. More information in details section.
subobs: a numeric vector to store number of observations for each chunk of top-q rankings.
q_ind: a numeric vector to store the beginning and ending of each chunk of top-q rankings. The last element has to be ndistinct+1.

References

Qian Z, Yu L. H. P (2019) "Weighted Distance-Based Models for Ranking Data Using the R Package rankdist." Journal of Statistical Software, 90(5), 1-31. doi: 10.18637/jss.v090.i05

Examples

# creating a data set with only complete rankings
rankmat <- replicate(10,sample(1:52,52), simplify = "array")
countvec <- sample(1:52,52,replace=TRUE)
rankdat <- new("RankData",ranking=rankmat,count=countvec)
# creating a data set with both complete and top-10 rankings
rankmat_in <- replicate(10,sample(1:52,52), simplify = "array")
rankmat_in[rankmat_in>11] <- 11
rankmat_total <- cbind(rankmat_in, rankmat)
countvec_total <- c(countvec,countvec)
rankdat2 <- new("RankData",ranking=rankmat_total,count=countvec_total, nobj=52, topq=c(10,51))