mChoice {Hmisc} | R Documentation |
Methods for Storing and Analyzing Multiple Choice Variables
Description
mChoice
is a function that is useful for grouping
variables that represent
individual choices on a multiple choice question. These choices are
typically factor or character values but may be of any type. Levels
of component factor variables need not be the same; all unique levels
(or unique character values) are collected over all of the multiple
variables. Then a new character vector is formed with integer choice
numbers separated by semicolons. Optimally, a database system would
have exported the semicolon-separated character strings with a
levels
attribute containing strings defining value labels
corresponding to the integer choice numbers. mChoice
is a
function for creating a multiple-choice variable after the fact.
mChoice
variables are explicitly handed by the describe
and summary.formula
functions. NA
s or blanks in input
variables are ignored.
format.mChoice
will convert the multiple choice representation
to text form by substituting levels
for integer codes.
as.double.mChoice
converts the mChoice
object to a
binary numeric matrix, one column per used level (or all levels of
drop=FALSE
. This is called by
the user by invoking as.numeric
. There is a
print
method and a summary
method, and a print
method for the summary.mChoice
object. The summary
method computes frequencies of all two-way choice combinations, the
frequencies of the top 5 combinations, information about which other
choices are present when each given choice is present, and the
frequency distribution of the number of choices per observation. This
summary
output is used in the describe
function. The
print
method returns an html character string if
options(prType='html')
is in effect if render=FALSE
or
renders the html otherwise. This is used by print.describe
and
is most effective when short=TRUE
is specified to summary
.
in.mChoice
creates a logical vector the same length as x
whose elements are TRUE
when the observation in x
contains at least one of the codes or value labels in the second
argument.
match.mChoice
creates an integer vector of the indexes of all
elements in table
which contain any of the speicified levels
nmChoice
returns an integer vector of the number of choices
that were made
is.mChoice
returns TRUE
is the argument is a multiple
choice variable.
Usage
mChoice(..., label='',
sort.levels=c('original','alphabetic'),
add.none=FALSE, drop=TRUE, ignoreNA=TRUE)
## S3 method for class 'mChoice'
format(x, minlength=NULL, sep=";", ...)
## S3 method for class 'mChoice'
as.double(x, drop=FALSE, ...)
## S3 method for class 'mChoice'
print(x, quote=FALSE, max.levels=NULL,
width=getOption("width"), ...)
## S3 method for class 'mChoice'
as.character(x, ...)
## S3 method for class 'mChoice'
summary(object, ncombos=5, minlength=NULL,
drop=TRUE, short=FALSE, ...)
## S3 method for class 'summary.mChoice'
print(x, prlabel=TRUE, render=TRUE, ...)
## S3 method for class 'mChoice'
x[..., drop=FALSE]
match.mChoice(x, table, nomatch=NA, incomparables=FALSE)
inmChoice(x, values, condition=c('any', 'all'))
inmChoicelike(x, values, condition=c('any', 'all'),
ignore.case=FALSE, fixed=FALSE)
nmChoice(object)
is.mChoice(x)
## S3 method for class 'mChoice'
Summary(..., na.rm)
Arguments
na.rm |
Logical: remove |
table |
a vector (mChoice) of values to be matched against. |
nomatch |
value to return if a value for |
incomparables |
logical whether incomparable values should be compaired. |
... |
a series of vectors |
label |
a character string |
sort.levels |
set |
add.none |
Set |
drop |
set |
ignoreNA |
set to |
x |
an object of class |
object |
an object of class |
ncombos |
maximum number of combos. |
width |
With of a line of text to be formated |
quote |
quote the output |
max.levels |
max levels to be displayed |
minlength |
By default no abbreviation of levels is done in
|
short |
set to |
sep |
character to use to separate levels when formatting |
prlabel |
set to |
render |
applies of |
values |
a scalar or vector. If |
condition |
set to |
ignore.case |
set to |
fixed |
see |
Value
mChoice
returns a character vector of class "mChoice"
plus attributes "levels"
and "label"
.
summary.mChoice
returns an object of class
"summary.mChoice"
. inmChoice
and inmChoicelike
return a logical vector.
format.mChoice
returns a character vector, and
as.double.mChoice
returns a binary numeric matrix.
nmChoice
returns an integer vector.
print.summary.mChoice
returns an html character string if
options(prType='html')
is in effect.
Author(s)
Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com
See Also
Examples
options(digits=3)
set.seed(3)
n <- 20
sex <- factor(sample(c("m","f"), n, rep=TRUE))
age <- rnorm(n, 50, 5)
treatment <- factor(sample(c("Drug","Placebo"), n, rep=TRUE))
# Generate a 3-choice variable; each of 3 variables has 5 possible levels
symp <- c('Headache','Stomach Ache','Hangnail',
'Muscle Ache','Depressed')
symptom1 <- sample(symp, n, TRUE)
symptom2 <- sample(symp, n, TRUE)
symptom3 <- sample(symp, n, TRUE)
cbind(symptom1, symptom2, symptom3)[1:5,]
Symptoms <- mChoice(symptom1, symptom2, symptom3, label='Primary Symptoms')
Symptoms
print(Symptoms, long=TRUE)
format(Symptoms[1:5])
inmChoice(Symptoms,'Headache')
inmChoicelike(Symptoms, 'head', ignore.case=TRUE)
levels(Symptoms)
inmChoice(Symptoms, 3)
# Find all subjects with either of two symptoms
inmChoice(Symptoms, c('Headache','Hangnail'))
# Note: In this example, some subjects have the same symptom checked
# multiple times; in practice these redundant selections would be NAs
# mChoice will ignore these redundant selections
# Find all subjects with both symptoms
inmChoice(Symptoms, c('Headache', 'Hangnail'), condition='all')
meanage <- N <- numeric(5)
for(j in 1:5) {
meanage[j] <- mean(age[inmChoice(Symptoms,j)])
N[j] <- sum(inmChoice(Symptoms,j))
}
names(meanage) <- names(N) <- levels(Symptoms)
meanage
N
# Manually compute mean age for 2 symptoms
mean(age[symptom1=='Headache' | symptom2=='Headache' | symptom3=='Headache'])
mean(age[symptom1=='Hangnail' | symptom2=='Hangnail' | symptom3=='Hangnail'])
summary(Symptoms)
#Frequency table sex*treatment, sex*Symptoms
summary(sex ~ treatment + Symptoms, fun=table)
# Check:
ma <- inmChoice(Symptoms, 'Muscle Ache')
table(sex[ma])
# could also do:
# summary(sex ~ treatment + mChoice(symptom1,symptom2,symptom3), fun=table)
#Compute mean age, separately by 3 variables
summary(age ~ sex + treatment + Symptoms)
summary(age ~ sex + treatment + Symptoms, method="cross")
f <- summary(treatment ~ age + sex + Symptoms, method="reverse", test=TRUE)
f
# trio of numbers represent 25th, 50th, 75th percentile
print(f, long=TRUE)