| mChoice {Hmisc} | R Documentation |
Methods for Storing and Analyzing Multiple Choice Variables
Description
mChoice is a function that is useful for grouping
variables that represent
individual choices on a multiple choice question. These choices are
typically factor or character values but may be of any type. Levels
of component factor variables need not be the same; all unique levels
(or unique character values) are collected over all of the multiple
variables. Then a new character vector is formed with integer choice
numbers separated by semicolons. Optimally, a database system would
have exported the semicolon-separated character strings with a
levels attribute containing strings defining value labels
corresponding to the integer choice numbers. mChoice is a
function for creating a multiple-choice variable after the fact.
mChoice variables are explicitly handed by the describe
and summary.formula functions. NAs or blanks in input
variables are ignored.
format.mChoice will convert the multiple choice representation
to text form by substituting levels for integer codes.
as.double.mChoice converts the mChoice object to a
binary numeric matrix, one column per used level (or all levels of
drop=FALSE. This is called by
the user by invoking as.numeric. There is a
print method and a summary method, and a print
method for the summary.mChoice object. The summary
method computes frequencies of all two-way choice combinations, the
frequencies of the top 5 combinations, information about which other
choices are present when each given choice is present, and the
frequency distribution of the number of choices per observation. This
summary output is used in the describe function. The
print method returns an html character string if
options(prType='html') is in effect if render=FALSE or
renders the html otherwise. This is used by print.describe and
is most effective when short=TRUE is specified to summary.
in.mChoice creates a logical vector the same length as x
whose elements are TRUE when the observation in x
contains at least one of the codes or value labels in the second
argument.
match.mChoice creates an integer vector of the indexes of all
elements in table which contain any of the speicified levels
nmChoice returns an integer vector of the number of choices
that were made
is.mChoice returns TRUE is the argument is a multiple
choice variable.
Usage
mChoice(..., label='',
sort.levels=c('original','alphabetic'),
add.none=FALSE, drop=TRUE, ignoreNA=TRUE)
## S3 method for class 'mChoice'
format(x, minlength=NULL, sep=";", ...)
## S3 method for class 'mChoice'
as.double(x, drop=FALSE, ...)
## S3 method for class 'mChoice'
print(x, quote=FALSE, max.levels=NULL,
width=getOption("width"), ...)
## S3 method for class 'mChoice'
as.character(x, ...)
## S3 method for class 'mChoice'
summary(object, ncombos=5, minlength=NULL,
drop=TRUE, short=FALSE, ...)
## S3 method for class 'summary.mChoice'
print(x, prlabel=TRUE, render=TRUE, ...)
## S3 method for class 'mChoice'
x[..., drop=FALSE]
match.mChoice(x, table, nomatch=NA, incomparables=FALSE)
inmChoice(x, values, condition=c('any', 'all'))
inmChoicelike(x, values, condition=c('any', 'all'),
ignore.case=FALSE, fixed=FALSE)
nmChoice(object)
is.mChoice(x)
## S3 method for class 'mChoice'
Summary(..., na.rm)
Arguments
na.rm |
Logical: remove |
table |
a vector (mChoice) of values to be matched against. |
nomatch |
value to return if a value for |
incomparables |
logical whether incomparable values should be compaired. |
... |
a series of vectors |
label |
a character string |
sort.levels |
set |
add.none |
Set |
drop |
set |
ignoreNA |
set to |
x |
an object of class |
object |
an object of class |
ncombos |
maximum number of combos. |
width |
With of a line of text to be formated |
quote |
quote the output |
max.levels |
max levels to be displayed |
minlength |
By default no abbreviation of levels is done in
|
short |
set to |
sep |
character to use to separate levels when formatting |
prlabel |
set to |
render |
applies of |
values |
a scalar or vector. If |
condition |
set to |
ignore.case |
set to |
fixed |
see |
Value
mChoice returns a character vector of class "mChoice"
plus attributes "levels" and "label".
summary.mChoice returns an object of class
"summary.mChoice". inmChoice and inmChoicelike
return a logical vector.
format.mChoice returns a character vector, and
as.double.mChoice returns a binary numeric matrix.
nmChoice returns an integer vector.
print.summary.mChoice returns an html character string if
options(prType='html') is in effect.
Author(s)
Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com
See Also
Examples
options(digits=3)
set.seed(3)
n <- 20
sex <- factor(sample(c("m","f"), n, rep=TRUE))
age <- rnorm(n, 50, 5)
treatment <- factor(sample(c("Drug","Placebo"), n, rep=TRUE))
# Generate a 3-choice variable; each of 3 variables has 5 possible levels
symp <- c('Headache','Stomach Ache','Hangnail',
'Muscle Ache','Depressed')
symptom1 <- sample(symp, n, TRUE)
symptom2 <- sample(symp, n, TRUE)
symptom3 <- sample(symp, n, TRUE)
cbind(symptom1, symptom2, symptom3)[1:5,]
Symptoms <- mChoice(symptom1, symptom2, symptom3, label='Primary Symptoms')
Symptoms
print(Symptoms, long=TRUE)
format(Symptoms[1:5])
inmChoice(Symptoms,'Headache')
inmChoicelike(Symptoms, 'head', ignore.case=TRUE)
levels(Symptoms)
inmChoice(Symptoms, 3)
# Find all subjects with either of two symptoms
inmChoice(Symptoms, c('Headache','Hangnail'))
# Note: In this example, some subjects have the same symptom checked
# multiple times; in practice these redundant selections would be NAs
# mChoice will ignore these redundant selections
# Find all subjects with both symptoms
inmChoice(Symptoms, c('Headache', 'Hangnail'), condition='all')
meanage <- N <- numeric(5)
for(j in 1:5) {
meanage[j] <- mean(age[inmChoice(Symptoms,j)])
N[j] <- sum(inmChoice(Symptoms,j))
}
names(meanage) <- names(N) <- levels(Symptoms)
meanage
N
# Manually compute mean age for 2 symptoms
mean(age[symptom1=='Headache' | symptom2=='Headache' | symptom3=='Headache'])
mean(age[symptom1=='Hangnail' | symptom2=='Hangnail' | symptom3=='Hangnail'])
summary(Symptoms)
#Frequency table sex*treatment, sex*Symptoms
summary(sex ~ treatment + Symptoms, fun=table)
# Check:
ma <- inmChoice(Symptoms, 'Muscle Ache')
table(sex[ma])
# could also do:
# summary(sex ~ treatment + mChoice(symptom1,symptom2,symptom3), fun=table)
#Compute mean age, separately by 3 variables
summary(age ~ sex + treatment + Symptoms)
summary(age ~ sex + treatment + Symptoms, method="cross")
f <- summary(treatment ~ age + sex + Symptoms, method="reverse", test=TRUE)
f
# trio of numbers represent 25th, 50th, 75th percentile
print(f, long=TRUE)