pick {mudfold} | R Documentation |
Transform items to preference binary data.
Description
Function pick
can be used to transform quantitative or ordinal type of variables, into binary form (i.e., 0
,1
). When byItem=FALSE
, then the underlying idea is that the individual selects those items with the higher preference. This is done through user provided cut-off values, or by assuming a pick k
out of N
response process, where, each continuous response vector takes a 1
at its k
higher values. Dichotomization can be performed row-wise (default) or column-wise.
Usage
pick(data , k=NULL, cutoff=NULL, byItem=FALSE)
Arguments
data |
: A matrix or data frame containing the continuous or discrete responses of |
k |
: An integer ( |
cutoff |
:The value(s) that will be used as thresholds. The length of this argument should be equal to 1 (the same threshold for all rows (or columns) of |
byItem |
: logical argument. If byItem=TRUE, the dichotomization is performed columnwise. In the default byItem=FALSE, the function determines the ones rowise. |
Details
Binary transformation of continuous or discrete variables with \rho\ge 3
number of levels. Two different methods are available for the transformation.
The first method uses the argument k
in the pick
function, and assumes a pick k
out of N
response process. Such type of response processes are met in surveys and questionnaires, in which respondents are asked to pick exactly the k
most preferred items. The value for k
is an integer between 1 and ncol(data)
. By choosing an integer for k
, this function ”picks” the k
higher values in each row (if byItem=FALSE
) of data
. The k
higher values in each row become 1 and the rest ncol(data)-k
elements are set to 0. Obviously, if k=ncol(data)
, then the resulting matrix will only consists of 1's and no 0's.
The second method is based on thresholding in order to binarize the data. For this method, the user should provide threshold(s) with the parameter cutoff
in the pick
function (default cutoff=NULL
). If one value is provided in the cutoff
parameter, i.e., cutoff=
\alpha
, then \alpha
is used as threshold in each row i
(if byItem=FALSE
) of the data matrix data
such that, any value greater than or equal to cutoff
in row i
becomes 1 and 0 else. Additionally, the user can provide row (or column) specific cut off values, i.e., cutoff=
\alpha
with \alpha=(\alpha_1,...,\alpha_K)
where \alpha_i
is the cut-off value for the row or column i
. In this case, if x_{ij}\ge \alpha_i
then x_{ij}=1
and x_{ij}=0
else.
The two methods cannot be used simultaneously. Only one of the parameters k
and cutoff
can be different than NULL
each time. If both parameters are equal NULL
(default), then a row specific cut off is determined automatically for each row i
of data
, such that, \alpha_i= \bar{data_i}
. The dichotomization is performed by row of data
, except the case, byItem=TRUE
.
When the argument k
is used, it can be the case that more than k
values can be picked (i.e., ties). In this case, the choice on which item will be picked is being made after we add a small amount of noise in each observation of row or column i
. This is done with the function jitter
.
Value
Binary valued (i.e., 0-1) data with the same dimensions as the input.
Warning
!!! This function should be used with care. Dichotomization may distort the data structure and lead to potential information loss. In the case of polytomous items, the user is suggested to consider polytomous unfolding models that take into account different levels of measurement. !!!
Author(s)
Spyros E. Balafas (auth.), Wim P. Krijnen (auth.), Wendy J. Post (contr.), Ernst C. Wit (auth.)
Maintainer: Spyros E. Balafas (s.balafas@rug.nl)
Examples
## Not run:
### simulate some data with 3 discrete variables with three levels
### and 1 variable with 4 levels
d1 <- cbind(sample(1:3,20,replace = TRUE),
sample(1:3,20,replace = TRUE,prob = c(0.3,0.3,0.4)),
sample(1:3,20,replace = TRUE,prob = c(0.2,0.4,0.4)),
sample(1:4,20,replace = TRUE,prob = c(.1,.3,.4,.2)))
### apply pick on d1 ###
# binarize at the mean of
# each row and column
d1_rowmean <- pick(d1)
d1_colmean <- pick(d1,byItem = TRUE)
# binarize at the cutoff=2
d1_cut <- pick(d1,cutoff = 2,byItem = TRUE)
# binarize at different cutoffs (per row)
# for example at the median of each row
med_cuts <- apply(d1,1,median)
d1_cuts <- pick(d1,cutoff = med_cuts)
# binarize at different cutoffs (per column)
# for example at the median of each column
med_cuts_col <- apply(d1,2,median)
d1_cuts_col <- pick(d1,cutoff = med_cuts_col,byItem = TRUE)
# binarize at the k=2 higher values
# per row and column
d1_krow <- pick(d1,k = 2)
d1_kcol <- pick(d1,k = 2,byItem = TRUE)
## End(Not run)