R: Transform items to preference binary data.

pick {mudfold}

R Documentation

Transform items to preference binary data.

Description

Function pick can be used to transform quantitative or ordinal type of variables, into binary form (i.e., 0,1). When byItem=FALSE, then the underlying idea is that the individual selects those items with the higher preference. This is done through user provided cut-off values, or by assuming a pick k out of N response process, where, each continuous response vector takes a 1 at its k higher values. Dichotomization can be performed row-wise (default) or column-wise.

Usage

pick(data , k=NULL, cutoff=NULL, byItem=FALSE)

Arguments

`data`	: A matrix or data frame containing the continuous or discrete responses of `nrow(data)` persons/judges to `ncol(data)` items. Missing values in `data` are not allowed.
`k`	: An integer (`1 \le` `k` `\le` `ncol(data)`) that restricts the number of items a person can pick (default `k=NULL`). This argument, is used if one wants to transform the data into pick `k` out of `N` form. If `k` is provided by the user, `cutoff` should be `NULL` and vice verca. By default, this process is applied to the matrix `data` rowise. The user can restrict the number
`cutoff`	:The value(s) that will be used as thresholds. The length of this argument should be equal to 1 (the same threshold for all rows (or columns) of `data`) or equal to `K` where `K=nrow(data)` or `K=ncol(data)` when `byItem=TRUE`.
`byItem`	: logical argument. If byItem=TRUE, the dichotomization is performed columnwise. In the default byItem=FALSE, the function determines the ones rowise.

Details

Binary transformation of continuous or discrete variables with \rho\ge 3 number of levels. Two different methods are available for the transformation.

The first method uses the argument k in the pick function, and assumes a pick k out of N response process. Such type of response processes are met in surveys and questionnaires, in which respondents are asked to pick exactly the k most preferred items. The value for k is an integer between 1 and ncol(data). By choosing an integer for k, this function ”picks” the k higher values in each row (if byItem=FALSE) of data. The k higher values in each row become 1 and the rest ncol(data)-k elements are set to 0. Obviously, if k=ncol(data), then the resulting matrix will only consists of 1's and no 0's.

The second method is based on thresholding in order to binarize the data. For this method, the user should provide threshold(s) with the parameter cutoff in the pick function (default cutoff=NULL). If one value is provided in the cutoff parameter, i.e., cutoff=\alpha, then \alpha is used as threshold in each row i (if byItem=FALSE) of the data matrix data such that, any value greater than or equal to cutoff in row i becomes 1 and 0 else. Additionally, the user can provide row (or column) specific cut off values, i.e., cutoff=\alpha with \alpha=(\alpha_1,...,\alpha_K) where \alpha_i is the cut-off value for the row or column i. In this case, if x_{ij}\ge \alpha_i then x_{ij}=1 and x_{ij}=0 else.

The two methods cannot be used simultaneously. Only one of the parameters k and cutoff can be different than NULL each time. If both parameters are equal NULL (default), then a row specific cut off is determined automatically for each row i of data, such that, \alpha_i= \bar{data_i}. The dichotomization is performed by row of data, except the case, byItem=TRUE.

When the argument k is used, it can be the case that more than k values can be picked (i.e., ties). In this case, the choice on which item will be picked is being made after we add a small amount of noise in each observation of row or column i. This is done with the function jitter.

Value

Binary valued (i.e., 0-1) data with the same dimensions as the input.

Warning

!!! This function should be used with care. Dichotomization may distort the data structure and lead to potential information loss. In the case of polytomous items, the user is suggested to consider polytomous unfolding models that take into account different levels of measurement. !!!

Author(s)

Spyros E. Balafas (auth.), Wim P. Krijnen (auth.), Wendy J. Post (contr.), Ernst C. Wit (auth.)

Maintainer: Spyros E. Balafas (s.balafas@rug.nl)

Examples

## Not run:  
### simulate some data with 3 discrete variables with three levels
### and 1 variable with 4 levels
d1 <- cbind(sample(1:3,20,replace = TRUE),
            sample(1:3,20,replace = TRUE,prob = c(0.3,0.3,0.4)),
            sample(1:3,20,replace = TRUE,prob = c(0.2,0.4,0.4)),
            sample(1:4,20,replace = TRUE,prob = c(.1,.3,.4,.2)))


### apply pick on d1 ###  
# binarize at the mean of 
# each row and column
d1_rowmean <- pick(d1)
d1_colmean <- pick(d1,byItem = TRUE)

# binarize at the cutoff=2 
d1_cut <- pick(d1,cutoff = 2,byItem = TRUE)

# binarize at different cutoffs (per row) 
# for example at the median of each row
med_cuts <- apply(d1,1,median)
d1_cuts <- pick(d1,cutoff = med_cuts)

# binarize at different cutoffs (per column) 
# for example at the median of each column
med_cuts_col <- apply(d1,2,median)
d1_cuts_col <- pick(d1,cutoff = med_cuts_col,byItem = TRUE)


# binarize at the k=2 higher values
# per row and column
d1_krow <- pick(d1,k = 2)
d1_kcol <- pick(d1,k = 2,byItem = TRUE)

## End(Not run)

[Package mudfold version 1.1.21 Index]