CRT_carryovereffect {CRTConjoint} | R Documentation |
Testing carryover effect in Conjoint Experiments
Description
This function takes a conjoint dataset and returns the p-value when using the CRT to test if the carryover effect holds using HierNet test statistic. The function requires user to specify the outcome, all factors used in the conjoint experiment, and the evaluation task number. By default, this function assumes a uniform randomization of factor levels. The function assumes the forced choice conjoint experiment and consequently assumes the data to contain the left and right profile factors in separate column in the dataset supplied.
Usage
CRT_carryovereffect(
formula,
data,
left,
right,
task,
design = "Uniform",
supplyown_resamples = NULL,
profileorder_constraint = TRUE,
non_factor = NULL,
B = 200,
parallel = TRUE,
num_cores = 2,
nfolds = 3,
lambda = c(20, 30, 40),
tol = 0.001,
seed = sample(c(1:1000), size = 1),
verbose = TRUE
)
Arguments
formula |
A formula object specifying the outcome variable on the left-hand side and factors of (X,Z) and respondent characteristics (V) in the right hand side. RHS variables should be separated by + signs and should only contain either left or right for each (X,Z). For example Y ~ Country_left + Education_left is sufficient as opposed to Y ~ Country_left + Country_right + Education_left + Education_right |
data |
A dataframe containing outcome variable and all factors (X,Z,V) (including both left and right profile factors). All (X,Z,V) listed in the formula above are expected to be of class factor unless explicitly stated in non_factor input. |
left |
Vector of column names of data that corresponds to the left profile factors |
right |
Vector of column names of data that corresponds to the right profile factors. NOTE: left and right are assumed to be the same length and the order should correspond to the same variables. For example left = c("Country_left", "Education_left") and right = c("Country_right", "Education_right") |
task |
A character string indicating column of data that contains the task evaluation. IMPORTANT: The task variable is assumed to have no missing tasks, i.e., each respondent should have 1:J tasks. Please drop respondents with missing tasks. |
design |
A character string of one of the following options: "Uniform" or "Manual". "Uniform" refers to a completely uniform design where all (X,Z) are sampled uniformly. "Manual" refers to more complex conjoint designs, where the user will supply their own resamples in supplyown_resamples input. |
supplyown_resamples |
List of length B that contains own resamples of X when design="Manual". Each element of list should contain a dataframe with the same number of rows of data and two columns for the left and right profile values of X. |
profileorder_constraint |
Boolean indicating whether to enforce profile order constraint (default = TRUE) |
non_factor |
A vector of strings indicating columns of data that are not factors. This should only be used for respondent characteristics (V) that are not factors. For example non_factor = "Respondent_Age". |
B |
Numeric integer value indicating the number of resamples for the CRT procedure. Default value is B=200. |
parallel |
Boolean indicating whether parallel computing should be used. Default value is TRUE. |
num_cores |
Numeric integer indicating number of cores to use when parallel=TRUE. num_cores should not exceed the number of cores the user's machine can handle. Default is 2. |
nfolds |
Numeric integer indicating number of cross-validation folds. Default is 3. |
lambda |
Numeric vector indicating lambda used for cross-validation for HierNet fit. Default lambda=c(20,30,40). |
tol |
Numeric value indicating acceptable tolerance for terminating optimization fit for HierNet. Default is tol=1e-3. WARNING: Do not increase as it greatly increases computation time. |
seed |
Seed used for CRT procedure |
verbose |
Boolean indicating verbose output. Default verbose=TRUE |
Value
A list containing:
p_val |
A numeric value for the p-value testing carryover effect. |
obs_test_stat |
A numeric value for the observed test statistic. |
resampled_test_stat |
Matrix containing all the B resampled test statistics |
tol |
Tolerance used for HierNet |
lam |
Best cross-validated lambda |
hiernet_fit |
An object of class hiernet that contains the hiernet fit for the observed test statistic |
seed |
Seed used |
elapsed_time |
Elapsed time |
References
Ham, D., Janson, L., and Imai, K. (2022) Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis
Examples
# Subset of Immigration Choice Conjoint Experiment Data from Hainmueller et. al. (2014).
data("immigrationdata")
form = formula("Y ~ FeatEd + FeatGender + FeatCountry + FeatReason + FeatJob +
FeatExp + FeatPlans + FeatTrips + FeatLang + ppage + ppeducat + ppethm + ppgender")
left = colnames(immigrationdata)[1:9]
right = colnames(immigrationdata)[10:18]
# Each respondent evaluated 5 tasks
J = 5
carryover_df = immigrationdata
carryover_df$task = rep(1:J, nrow(carryover_df)/J)
# Since immigration conjoint experiment had dependent randomization for several factors
# we supply our own resamples
resample_func_immigration = function(x, seed = sample(c(0, 1000), size = 1), left_idx, right_idx) {
set.seed(seed)
df = x[, c(left_idx, right_idx)]
variable = colnames(x)[c(left_idx, right_idx)]
len = length(variable)
resampled = list()
n = nrow(df)
for (i in 1:len) {
var = df[, variable[i]]
lev = levels(var)
resampled[[i]] = factor(sample(lev, size = n, replace = TRUE))
}
resampled_df = data.frame(resampled[[1]])
for (i in 2:len) {
resampled_df = cbind(resampled_df, resampled[[i]])
}
colnames(resampled_df) = colnames(df)
#escape persecution was dependently randomized
country_1 = resampled_df[, "FeatCountry"]
country_2 = resampled_df[, "FeatCountry_2"]
i_1 = which((country_1 == "Iraq" | country_1 == "Sudan" | country_1 == "Somalia"))
i_2 = which((country_2 == "Iraq" | country_2 == "Sudan" | country_2 == "Somalia"))
reason_1 = resampled_df[, "FeatReason"]
reason_2 = resampled_df[, "FeatReason_2"]
levs = levels(reason_1)
r_levs = levs[c(2,3)]
reason_1 = sample(r_levs, size = n, replace = TRUE)
reason_1[i_1] = sample(levs, size = length(i_1), replace = TRUE)
reason_2 = sample(r_levs, size = n, replace = TRUE)
reason_2[i_2] = sample(levs, size = length(i_2), replace = TRUE)
resampled_df[, "FeatReason"] = reason_1
resampled_df[, "FeatReason_2"] = reason_2
#profession high skill fix
educ_1 = resampled_df[, "FeatEd"]
educ_2 = resampled_df[, "FeatEd_2"]
i_1 = which((educ_1 == "Equivalent to completing two years of college in the US" |
educ_1 == "Equivalent to completing a college degree in the US" |
educ_1 == "Equivalent to completing a graduate degree in the US"))
i_2 = which((educ_2 == "Equivalent to completing two years of college in the US" |
educ_2 == "Equivalent to completing a college degree in the US" |
educ_2 == "Equivalent to completing a graduate degree in the US"))
job_1 = resampled_df[, "FeatJob"]
job_2 = resampled_df[, "FeatJob_2"]
levs = levels(job_1)
# take out computer programmer, doctor, financial analyst, and research scientist
r_levs = levs[-c(2,4,5, 9)]
job_1 = sample(r_levs, size = n, replace = TRUE)
job_1[i_1] = sample(levs, size = length(i_1), replace = TRUE)
job_2 = sample(r_levs, size = n, replace = TRUE)
job_2[i_2] = sample(levs, size = length(i_2), replace = TRUE)
resampled_df[, "FeatJob"] = job_1
resampled_df[, "FeatJob_2"] = job_2
resampled_df[colnames(resampled_df)] = lapply(resampled_df[colnames(resampled_df)], factor )
return(resampled_df)
}
own_resamples = list()
B = 50
for (i in 1:B) {
newdf = resample_func_immigration(carryover_df, left_idx = 1:9, right_idx = 10:18, seed = i)
own_resamples[[i]] = newdf
}
carryover_test = CRT_carryovereffect(formula = form, data = carryover_df, left = left,
right = right, task = "task", supplyown_resamples = own_resamples, B = B)
carryover_test$p_val