R: Creating a dataset suitable for Case 2 best-worst scaling...

bws2.dataset {support.BWS2}

R Documentation

Creating a dataset suitable for Case 2 best–worst scaling analysis using counting and modeling approaches

Description

This function creates a dataset used for bws2.count() in support.BWS2 and functions for discrete choice models such as clogit() in survival.

Usage

bws2.dataset(data, id, response, choice.sets, attribute.levels, 
  base.attribute = NULL, base.level = NULL, 
  reverse = TRUE, model = "paired",
  attribute.variables = NULL, effect = NULL, delete.best = FALSE, 
  type = c("paired", "marginal", "sequential"), 
   ...)

Arguments

`data`	A data frame containing a respondent dataset.
`id`	A character showing the name of the respondent identification number variable used in the respondent dataset.
`response`	A vector containing the names of response variables in the respondent dataset, showing the best and worst attribute levels selected in each Case 2 BWS question.
`choice.sets`	A data frame or matrix containing an orthogonal main-effect design.
`attribute.levels`	A list containing the names of the attributes and their levels.
`base.attribute`	A character showing the base attribute: the argument is used when attribute variables are created as effect coded ones and `NULL` is assigned to the argument when attribute variables are created as dummy coded ones.
`base.level`	A list containing the base level in each attribute: the argument is used when attribute level variables are created as effect coded ones and `NULL` is assigned to the argument when attribute level variables are created as dummy coded ones.
`reverse`	A logical value denoted by `TRUE` when the signs of the attribute variables are reversed for the possible worst, or otherwise `FALSE`.
`model`	A character showing a type of dataset created by this function: `"paired"` for a paired model, `"marginal"` for a marginal model, and `"sequential"` for a marginal sequential model.
`attribute.variables`	A character showing a type of attribute variables, denoted by `"reverse"` when the attribute variables take the value of `1` for a possible best, `-1` for a possible worst, and `0` otherwise, or `"constant"` when the attribute variables are created as attribute-specific constants. The argument is deprecated. Please use the argument `reverse` instead.
`effect`	A list containing the base level in each attribute: the argument is used when attribute level variables are created as effect coded ones and while `NULL` is assigned to the argument when attribute level variables are created as dummy coded ones. The argument is deprecated. Please use the argument `base.level` instead.
`delete.best`	A logical value denoted by `TRUE` when deleting an attribute level selected as the best in the worst choice set (that is, using a marginal sequential model) or `FALSE` when not doing so. The argument is deprecated. Please use the argument `model` instead.
`type`	A character showing a type of dataset created by this function: `"paired"` for a paired model, `"marginal"` for a marginal model, and `"sequential"` for a marginal sequential model. The argument is deprecated. Please use the argument `model` instead.
`...`	Optional arguments; currently not in use.

Details

The respondent dataset, in which each row corresponds to a respondent, must be organized by users and then assigned to the argument data. The dataset must include the respondent's identification number (id) variable in the first column and the response variables in the subsequent columns, each indicating which attribute levels are selected as the best and worst for each question. Other variables in the respondent dataset are treated as the respondents' characteristics such as gender and age. Respondents' characteristic variables are also stored in the resultant dataset created by the function bws2.dataset(). Although the names of the id and response variables are left to the discretion of the user, those of the id and response variables are assigned to the arguments id and response.

The response variables must be constructed such that the best attribute levels alternate with the worst by question. For example, when there are nine BWS questions, the variables are B1, W1, B2, W2, ..., B9, and W9. Here, Bi and Wi show the attribute levels selected as the best and worst in the i-th question. The row numbers of the attribute levels selected as the best and worst are stored in the response variables. For example, suppose that a respondent was asked to answer the following BWS question, which is the same as that shown on the help page of this package, and then selected A1 (attribute level in the first row) as the best and C2 (attribute level in the third row) as the worst.

Please select your best and worst attribute levels from the following four:

Best	Attribute	Worst
[_]	A1	[_]
[_]	B3	[_]
[_]	C2	[_]
[_]	D3	[_]

The response variables B1 and W1, corresponding to the respondent's answer to this question, take the value of 1 (= the attribute level in the first row) and 3 (= the attribute level in the third row).

The arguments choice.sets and attribute.levels are the same as those in bws2.questionnaire(). The order of questions in the respondent dataset has to be the same as that in choice.sets.

The arguments type, reverse, base.attribute, and base.level are set according to the model you will use: argument type is set as "paired" for the paired model, "marginal" for the marginal model, or "sequential" for the marginal sequential model; the argument reverse is set as "TRUE" for a model in which the signs of the attribute variables are reversed for the possible worst (Flynn et al. 2007 and 2008), or FALSE when not doing so (Hensher et al. 2015, Appendix 6B); the argument base.attribute is set as a character vector showing the base attribute for a marginal (sequential) model with effect-coded attribute variables; and the argument base.level is set as a list containing the base level in each attribute for a model with effect-coded level variables (Flynn et al. 2007 and 2008), while it is set as NULL for a model with dummy-coded attribute level variables (Hensher et al. 2015, Appendix 6B).

Note that the arguments attribute.variables, effect, delete.best, and type are deprecated and will be removed in the future.

Value

The function returns a dataset in data frame format for the paired model or one for the marginal (sequential) model. The dataset for the paired model contains the following variables and attribute and/or attribute-level variables explained above:

`id`	A respondent's identification number; the actual name and values of this variable is set according to the id variable in the respondent dataset.
`Q`	A serial number of BWS questions.
`PAIR`	A serial number for the possible pairs of the best and worst attribute levels for each question.
`BEST`	An attribute-level number treated as the best in the possible pairs of the best and worst attribute levels for each question.
`WORST`	An attribute-level number treated as the worst in the possible pairs of the best and worst attribute levels for each question.
`BEST.AT`	A character showing the attribute corresponding to the attribute level treated as the best in the possible pairs of the best and worst attribute levels for each question.
`WORST.AT`	A character showing the attribute corresponding to the attribute level treated as the worst in the possible pairs of the best and worst attribute levels for each question.
`BEST.LV`	A character showing the attribute level treated as the best in the possible pairs of the best and worst attribute levels for each question.
`WORST.LV`	A character showing the attribute level treated as the worst in the possible pairs of the best and worst attribute levels for each question.
`RES.B`	A row number in the profile corresponding to the attribute level selected as the best by respondents.
`RES.W`	A row number in the profile corresponding to the attribute level selected as the worst by respondents.
`RES`	Responses to BWS questions that takes the value of `1` if a possible pair of the best and worst attribute levels is selected by respondents and `0` otherwise: this variable is used as a dependent variable in the model formula of the function for discrete choice analysis (e.g., `clogit()` in the package survival).
`STR`	A stratification variable identifying each combination of respondent and question; the variable is also used in the model formula of `clogit()`.

The dataset for the marginal (sequential) model contains the variables id, Q, RES.B, RES.W, and STR mentioned above and the following variables:

`ALT`	A serial number of alternatives (attribute levels) for each question.
`BW`	A state variable that takes the value of `1` for the possible best attribute levels and `-1` for the possible worst attribute levels.
`ATT.cha`	A character showing the attribute corresponding to the attribute level treated as the possible best or worst for each question.
`ATT`	An attribute number showing the attribute corresponding to the attribute level treated as the possible best or worst for each question.
`LEV.cha`	A character showing the attribute levels treated as the possible best or worst for each question.
`LEV`	An attribute level number showing the attribute level treated as the possible best or worst for each question.
`RES`	Responses to BWS questions that takes the value of `1` if the possible best or worst attribute level is selected by respondents and `0` otherwise.

The output has its attributes that consist of arguments assigned to this function (i.e., id, response, choice.sets, attribute.levels, reverse, base.attribute, base.level, attribute.variables, effect, delete.best, and type) and the following:

`design.matrix`	Design matrix.
`lev.var.wo.ref`	Names of attribute-level variables excluding base levels.
`freq.levels`	Frequency of attribute levels shown in all the questions.
`respondent.characteristics`	Names of variables corresponding to the respondents' characteristics: variables, except for the respondents' id and response variables, are considered the respondents' characteristics.

Author(s)

Hideo Aizaki

Examples

# Load package survival used for a conditional logit model analysis of
# the responses
require(survival)

# Set a three-level orthogonal main-effect design (OMED) with
# four columns
omed <- matrix(
  c(1,3,2,3,
    3,1,2,2,
    3,3,3,1,
    2,3,1,2,
    2,2,2,1,
    1,1,1,1,
    1,2,3,2,
    3,2,1,3,
    2,1,3,3),
  nrow = 9, ncol = 4, byrow = TRUE)
omed
## The OMED is generated by executing the following lines of code:
## require(DoE.base)
## set.seed(123)
## omed <- data.matrix(oa.design(nl = c(3, 3, 3, 3)))

# Set the names of the attributes and attribute levels
attr.lev <- list(
  A = c("A1","A2","A3"), B = c("B1","B2","B3"),
  C = c("C1","C2","C3"), D = c("D1","D2","D3"))

# Convert the OMED into Case 2 BWS questions using three formats:
## Attribute column is located on the left-hand side
bws2.questionnaire(omed, attribute.levels = attr.lev,
  position = "left") 
## Attribute column is located in the center
bws2.questionnaire(omed, attribute.levels = attr.lev,
  position = "center")
## Attribute column is located on the right-hand side
bws2.questionnaire(omed, attribute.levels = attr.lev,
  position = "right") 

# Set respondent dataset containing 20 respondents who answered 
# nine BWS questions
resp.data <- data.frame(
  id = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20),
  B1 = c(2,2,2,1,2,4,2,2,2,2,1,2,2,4,2,3,2,3,2,2),
  W1 = c(1,1,1,4,1,3,3,1,4,1,4,4,1,1,1,4,1,1,4,4),
  B2 = c(1,1,2,1,1,3,1,1,1,1,2,1,1,2,1,3,1,3,1,1),
  W2 = c(2,4,4,4,4,2,4,2,4,2,4,4,4,4,2,4,4,1,4,4),
  B3 = c(1,1,2,1,2,1,1,1,1,2,1,1,1,2,1,1,1,1,3,1),
  W3 = c(4,4,4,2,4,4,4,3,4,3,4,4,3,1,4,4,3,4,4,4),
  B4 = c(1,2,2,1,2,1,2,2,2,1,2,4,2,2,2,4,2,2,1,2),
  W4 = c(3,4,3,2,3,3,3,1,4,3,3,3,4,3,3,1,4,3,4,4),
  B5 = c(1,2,2,1,2,1,2,1,3,1,1,1,3,1,1,1,3,1,1,1),
  W5 = c(4,1,3,4,4,4,3,4,4,4,2,4,4,2,4,2,1,4,3,4),
  B6 = c(2,4,2,1,2,1,4,3,1,1,1,1,3,2,1,2,3,4,1,4),
  W6 = c(4,1,4,4,4,3,3,4,4,2,4,2,4,4,3,4,4,1,4,1),
  B7 = c(3,3,2,3,4,1,2,3,3,3,2,1,3,2,1,2,3,1,3,2),
  W7 = c(1,4,1,4,1,4,4,4,4,2,4,4,4,4,4,4,4,4,4,4),
  B8 = c(1,1,2,1,2,2,1,1,1,2,1,2,1,1,1,3,1,1,1,1),
  W8 = c(3,3,3,3,3,3,3,3,4,3,3,3,4,3,3,4,4,3,4,3),
  B9 = c(3,3,3,1,3,1,1,3,1,1,1,1,3,1,1,1,3,1,1,1),
  W9 = c(2,1,2,2,2,2,4,2,4,2,4,2,2,2,2,4,1,2,2,2))

# Create a dataset and conduct a conditional logit model analysis
## Set response variables
response.vars <- names(resp.data)[2:19]
## Set a base level in each attribute
base.lev <- list(
  A = c("A3"), B = c("B3"), C = c("C3"), D = c("D3"))
## Paired model with attribute and attribute-level variables
pr.data <- bws2.dataset(
  data = resp.data,
  id = "id",
  response = response.vars,  
  choice.sets = omed,        
  attribute.levels = attr.lev,
  reverse = TRUE,
  base.level = base.lev,
  model = "paired")
attributes(pr.data)$design.matrix
head(pr.data, 12)
### Attribute variable D is omitted from the model
pr <- clogit(RES ~ A + B + C + 
  A1 + A2 + B1 + B2 + C1 + C2 + D1 + D2 + strata(STR), 
  data = pr.data)
pr
### Calculate coefficients of base level variables
b.pr <- coef(pr)
-sum(b.pr[4:5]) # attribute level A3
-sum(b.pr[6:7]) # attribute level B3
-sum(b.pr[8:9]) # attribute level C3
-sum(b.pr[10:11]) # attribute level D3
## Marginal model with attribute and attribute-level variables
mr.data <- bws2.dataset(
  data = resp.data,
  id = "id",
  response = response.vars,
  choice.sets = omed,
  attribute.levels = attr.lev,
  reverse = TRUE,
  base.level = base.lev,
  model = "marginal")
attributes(mr.data)$design.matrix
head(mr.data, 8)
### Attribute variable D is omitted from the model
mr <- clogit(RES ~ A + B + C + 
  A1 + A2 + B1 + B2 + C1 + C2 + D1 + D2 + strata(STR), 
  data = mr.data)
mr
### Calculate coefficients of base level variables
b.mr <- coef(mr)
-sum(b.mr[4:5]) # attribute level A3
-sum(b.mr[6:7]) # attribute level B3
-sum(b.mr[8:9]) # attribute level C3
-sum(b.mr[10:11]) # attribute level D3

# Calculate BWS scores
bwscores <- bws2.count(mr.data)
sum(bwscores, "level")
barplot(bwscores, "bw", "level")

[Package support.BWS2 version 0.4-0 Index]