| ddm {fddm} | R Documentation |
Estimation of 5-Parameter DDM
Description
Fit the 5-parameter DDM (Diffusion Decision Model) via maximum likelihood estimation. The model for each DDM parameter can be specified symbolically using R's formula interface. With the exception of the drift rate (which is always estimated) all parameters can be either fixed or estimates.
Usage
ddm(
drift,
boundary = ~1,
ndt = ~1,
bias = 0.5,
sv = 0,
data,
optim = "nlminb",
args_optim = list(init = NULL, lo_bds = NULL, up_bds = NULL, control = list()),
args_ddm = list(err_tol = 1e-06),
use_gradient = TRUE,
compiled_model = TRUE,
model = TRUE,
mmatrix = TRUE,
response = TRUE,
na.action,
subset,
contrasts = NULL
)
Arguments
drift |
Two-sided formula. The left-hand side describes the response,
the right-hand side provides a symbolic description of the regression model
underlying the drift rate (v). The left-hand side needs to specify the
column in the data containing response time and corresponding binary
decision, concatenated by |
boundary, ndt, bias, sv |
Either a one-sided formula providing a symbolic description of the regression model or a scalar number given the value this parameter should be fixed to. Boundary separation (a), non-decision time (t0), relative initial bias (w), or inter-trial variability in the drift rate (sv) |
data, na.action, subset |
arguments controlling formula processing via
|
optim |
character string or fitting function indicating which numerical
optimization method should be used. The default |
args_optim |
named list of additional arguments for the optimization function. The available options are:
|
args_ddm |
named list of additional arguments passed to density function
calculation. Currently, the only option for this is |
use_gradient |
logical. Should gradient be used during numerical
optimization? Default is |
compiled_model, model, mmatrix, response |
logicals. If |
contrasts |
optional list. See the contrasts.arg of
|
Details
ddm uses model.matrix for transforming the
symbolic description of the regression model underlying each parameter into
estimated coefficients. The following provides a few examples:
-
~ 1estimates a single coefficient, named(Intercept) -
~ conditionestimates the intercept plus k - 1 coefficients for a factor with k levels (e.g., intercept plus one coefficient if condition has two levels). The interpretation of the coefficients depend on the factor contrasts employed, which are usually based on the contrasts specified inoptions("contrasts"). For the defaulttreatmentcontrasts (contr.treatment), the intercept corresponds to the first factor level and the additional coefficients correspond to the difference from the intercept (i.e., first factor level). When usingcontr.sumthe intercept corresponds to the grand mean and the additional coefficients correspond to the differences from the grand mean. -
~ 0 + conditionestimates no intercept but one coefficient per factor level. This specification can also be used to get one coefficient per cell for a multi-factorial design, e.g.,~ 0 + condition1:condition2. -
~ 0 + condition1 + condition1:condition2estimates one "intercept" per level ofcondition1factor (which is not called intercept) plus k - 1 difference parameters from the condition-specific intercept for the k-levels ofcondition2. The interpretation of the difference parameters again depends on the contrasts used (e.g., treatment vs. sum-to-zero contrasts, see examples). This formula specification can often make sense for the drift rate whencondition1is the factor (such as item type) mapped to upper and lower response boundary of the DDM andcondition2is another factor by which we want the drift rate to differ. This essentially gives one overall drift rate per response boundary plus the differences from this overall one (note again that with treatment contrasts this overall drift rate is the first factor level ofcondition2).
To get meaningful results it is necessary to estimate separate drift rates for the different condition/item-types that are mapped onto the upper and lower boundary of the diffusion model.
If a non-default fitting function is used, it needs to minimize the
negative log-likelihood, accept the following arguments,
init, objective, gradient, lower, upper, control , and return a list
with the following arguments coefficients, loglik, converged, optim
(where converged is boolean and optim can be an arbitrary
list with additional information).
Value
Object of class ddm (i.e., a list with components as listed
below) for which a number of common methods such as print,
coef, and logLik are implemented, see
ddm-methods.
-
coefficientsa named list whose elements are the values of the estimated model parameters -
dpara character vector containing the names of the estimated model parameters -
fixed_dpara named list whose elements are the values of the fixed model parameters -
loglikthe value of the log-likelihood function at the optimized parameter values -
hessiansa named list whose elements are the individual Hessians for each of the model parameters -
vcova named list whose elements are the individual variance-covariance matrices for each of the model parameters -
nobsthe number of observations in the data used for fitting -
nparthe number of parameters used to fit the model (i.e., the estimated model parameters plus any hyperparameters) -
df.residualthe residual degrees of freedom (the number of observations - the number of parameters) -
callthe original function call toddm -
formulathe formulas used in the model (1indicates that the model parameter was estimated with a single coefficient;0indicates that the model parameter was fixed) -
dpar_formulasa named list whose elements are the formulas for the model parameters (1indicates that the model parameter was estimated with a single coefficient;0indicates that the model parameter was fixed) -
na.actionna.action -
termsa named list whose elements are the model parameters, except the last element is namedfulland shows the breakdown of the model with all model parameters -
levelsa named list whose elements are the levels associated with any parameters that are factors (the elements areNULLif the parameter is not a factor), and whose last element is namedFULLand shows all of the levels used in the model -
contrastsa named list whose elements are the type of contrasts used in the model -
args_ddma named list whose elements are the optional arguments used in the calculation of the DDM log-likelihood function -
linka named list whose elements show information about the link function used for each model parameter (currently the only link function is the identity function) -
convergeda logical indicating whether the optimization converged (TRUE) or not (FALSE) -
optim_infoa named list whose elements are information about the optimization process (e.g., the name of the algorithm used, the final value of the objective function, the number of evaluations of the gradient function, etc.) -
modelthe data used in the model (might need to check this) -
responsethe response data used in the model -
mmatrixa named list whose elements are the model matrices for each of the estimated parameters -
compiled_modelC++ object that contains the compiled model (see list below for more details)
The C++ object accessible via the compiled_model component of the
above R object of class ddm contains the following components:
-
rta numeric vector of the response time data used in the model -
responsean integer vector of the response data used in the model (coded such that1corresponds to the "lower" boundary and2corresponds to the "upper" boundary) -
err_tolthe error tolerance used in the calculations for fitting the DDM -
coefficientsa numeric vector containing the current set of coefficients for the formulas provided to theddm()function call; the coefficients correspond to the DDM parameters in the following order:v,a,t0,w,sv -
likelihooda double containing the log-likelihood for the current set ofcoefficients(note this can be changed by calling the functioncalculate_loglik()below) -
modmat_va numeric matrix containing the model matrix forv, the drift rate, determined by the formula input to the argumentdriftin theddm()function call -
modmat_aa numeric matrix containing the model matrix fora, the boundary separation, determined by the formula input to the argumentboundaryin theddm()function call -
modmat_t0a numeric matrix containing the model matrix fort0, the non-decision time, determined by the formula input to the argumentndtin theddm()function call -
modmat_wa numeric matrix containing the model matrix forw, the inital bias, determined by the formula input to the argumentbiasin theddm()function call -
modmat_sva numeric matrix containing the model matrix forsv, the inter-trial variability in the drift rate, determined by the formula input to the argumentsvin theddm()function call -
hess_va numeric matrix containing the Hessian forv, the drift rate, whose dimensions are determined by the formula input to the argumentdriftin theddm()function call -
hess_aa numeric matrix containing the Hessian fora, the boundary separation, whose dimensions are determined by the formula input to the argumentdriftin theddm()function call -
hess_t0a numeric matrix containing the Hessian fort0, the non-decision time, whose dimensions are determined by the formula input to the argumentdriftin theddm()function call -
hess_wa numeric matrix containing the Hessian forw, the initial bias, whose dimensions are determined by the formula input to the argumentdriftin theddm()function call -
hess_sva numeric matrix containing the Hessian forsv, the inter-trial variability in the drift rate, whose dimensions are determined by the formula input to the argumentdriftin theddm()function call -
vcov_va numeric matrix containing the variance-covariance matrix forv, the drift rate, whose dimensions are determined by the formula input to the argumentdriftin theddm()function call -
vcov_aa numeric matrix containing the variance-covariance matrix fora, the boundary separation, whose dimensions are determined by the formula input to the argumentboundaryin theddm()function call -
vcov_t0a numeric matrix containing the variance-covariance matrix fort0, the non-decision time, whose dimensions are determined by the formula input to the argumentndtin theddm()function call -
vcov_wa numeric matrix containing the variance-covariance matrix forw, the inital bias, whose dimensions are determined by the formula input to the argumentbiasin theddm()function call -
vcov_sva numeric matrix containing the variance-covariance matrix forv, the inter-trial variability in the drift rate, whose dimensions are determined by the formula input to the argumentsvin theddm()function call -
calculate_loglikcalculates and returns a double containing the negated log-likelihood (note that this will overwrite thelikelihoodcomponent of the C++ object) -
calculate_gradientcalculates and returns a numeric vector of the negated gradients for the provided coefficient values; the gradients are stored in the same manner as their correspondingcoefficents(note that this will overwrite thelikelihood) component of the C++ object) -
calculate_hessianscalculates and returns a named list of the negated Hessians for each model parameter for the provided coefficient values (note that this will overwrite thelikelihood,hess_v,hess_a,hess_t0,hess_w, andhess_svcomponents of the C++ object) -
calculate_vcovcalculates and returns a named list of the variance-covariance matrices for each model parameter for the storedcoefficients(note that this will overwrite thelikelihood,hess_v,hess_a,hess_t0,hess_w,hess_sv,vcov_v,vcov_a,vcov_t0,vcov_w, andvcov_svcomponents of the C++ object) -
calculate_standard_errorcalculates and returns a numeric vector of the standard errors of the storedcoefficients; the standard errors are stored in the same manner as their correspondingcoefficients(note that this will overwrite thelikelihood,hess_v,hess_a,hess_t0,hess_w,hess_sv,vcov_v,vcov_a,vcov_t0,vcov_w, andvcov_svcomponents of the C++ object)
Examples
# prepare data
data(med_dec, package = "fddm")
med_dec <- med_dec[which(med_dec[["rt"]] >= 0), ] ## only use valid RTs
## select data from one participant
p1 <- med_dec[med_dec[["id"]] == 2 & med_dec[["group"]] == "experienced", ]
head(p1)
##----------------------------------------------
## Easiest: Fitting using emmeans -
##----------------------------------------------
## Because we use an ANOVA approach, we set orthogonal sum-to-zero contrasts
op <- options(contrasts = c('contr.sum', 'contr.poly'))
fit0 <- ddm(rt + response ~ classification*difficulty, data = p1)
summary(fit0)
## for more tests, we use emmeans:
if (requireNamespace("emmeans")) {
# for ANOVA table:
emmeans::joint_tests(fit0)
# get conditional main effects of difficulty (for each level of classification):
emmeans::joint_tests(fit0, by = "classification")
# get mean drift rates per condition:
em1 <- emmeans::emmeans(fit0, "difficulty", by = "classification")
em1
# compare mean drift rates per condition
pairs(em1)
update(pairs(em1), by = NULL, adjust = "holm")
}
options(op) # reset contrasts
##----------------------------------------------------------------
## Fitting with custom parametrisation -
##----------------------------------------------------------------
## one drift rate per classification by difficulty design cell
fit1 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1)
summary(fit1)
## set default contrasts (just in case contrasts have been changed)
op <- options(contrasts = c('contr.treatment', 'contr.poly'))
## one drift rate "intercept" per classification condition (blast vs. non-blast)
## corresponding to first level of difficulty factor (easy)
## plus one further coefficient per classification condition corresponding to
## difference from "intercept" (hard - easy)
fit1b <- ddm(rt + response ~ 0 + classification + classification:difficulty,
data = p1, args_optim = list(control = list(iter.max = 1000,
eval.max = 1000)))
summary(fit1b)
options(op) # reset contrasts
## set orthogonal sum-to-zero contrasts
op <- options(contrasts = c('contr.sum', 'contr.poly'))
## one drift rate "intercept" per classification condition (blast vs. non-blast)
## corresponding to mean drift rate for the classification condition
## plus one further coefficient per classification condition corresponding to
## difference from "intercept" (hard/easy - mean drift rate)
fit1c <- ddm(rt + response ~ 0 + classification + classification:difficulty,
data = p1)
summary(fit1c)
options(op) ## reset contrasts
## all variants produce same fit, only meaning of parameters differs
logLik(fit1)
logLik(fit1b)
logLik(fit1c)
logLik(fit0) ## also model above
## all models estimate same drift rates, but in different parametrisation:
coef(fit1) ## drift rates per design cell
## same drift rates based on fit1b:
c(coef(fit1b)[1:2],
coef(fit1b)[1] + coef(fit1b)[3], coef(fit1b)[2] + coef(fit1b)[4])
## same drift rates based on fit1c:
c(coef(fit1c)[1] + coef(fit1c)[3], coef(fit1c)[2] + coef(fit1c)[4],
coef(fit1c)[1] - coef(fit1c)[3], coef(fit1c)[2] - coef(fit1c)[4])
# we can estimate a model that freely estimates response bias
# (instead of fixing it at 0.5)
fit2 <- ddm(rt + response ~ 0 + classification:difficulty, bias = ~1, data = p1)
summary(fit2)
## Note: estimating bias only makes sense in situations like here where the
## response boundaries do not correspond to correct/incorrect but to the
## actual responses participants gave (here: blast vs. non-blast classification)
## now let's perform a likelihood ratio test to check if estimating response
## bias freely leads to a significant increase in model fit?
if (requireNamespace("lmtest")) { ## requires package lmtest
lmtest::lrtest(fit1, fit2)
## does not look like it (p = 0.1691)
}
# we can also make a DDM parameter, such as boundary, depend on a numeric
# variable, such as block number
fit3 <- ddm(rt + response ~ 0 + classification:difficulty,
boundary = ~ block, data = p1)
summary(fit3)
## does making boundary depend on block leads to a significant increase in model
## fit?
if (requireNamespace("lmtest")) { ## requires package lmtest
lmtest::lrtest(fit1, fit3)
## does not look like it (p = 0.198)
}
##----------------------------------------------------------------
## Fitting with optimization arguments -
##----------------------------------------------------------------
## example of how to use your own initial values and bounds for the optimization
## of the coefficients (determined by the model matrices/formulas)
options(op) # reset contrasts
# this uses the default generated initial values and bounds
fitex0 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1)
# we can see the number of coefficients (and thus the number of initial values
# and bounds) required if we wish to use our own
fitex0$optim_info$args_optim$init
fitex0$optim_info$args_optim$lo_bds
fitex0$optim_info$args_optim$up_bds
# -1.9270279 2.5408864 -1.9270279 0.3181987 1.7907438 0.1264035
# -Inf -Inf -Inf -Inf 0 0
# Inf Inf Inf Inf Inf 0.461
# the first four coefficients are for the drift rate (given by the formula we
# provided), and the last two are for the boundary separation and non-decision
# time, respectively (note that the default is to estimate these two model
# parameters with a single coefficient).
# to use our own initial values, we can include them in the `args_optim` list,
# but note that they must be within the generated bounds
fitex1 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1,
args_optim = list(init = c(-1.5, 1, -1, 1, 1.5, 0.3)))
# to use our own bounds, we again can include them in the `args_optim` list,
# but note that they must contain the generated initial values
fitex2 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1,
args_optim = list(lo_bds = c(-20, -20, -20, -20, 0, 0),
up_bds = c(20, 20, 20, 20, 10, 0.5)))
# to use both our own initial values and bounds, we include them in the
# `args_optim` list, and the bounds must contain the initial values
fitex3 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1,
args_optim = list(init = c(-1.5, 1, -1, 1, 1.5, 0.3),
lo_bds = c(-20, -20, -20, -20, 0, 0),
up_bds = c(20, 20, 20, 20, 10, 0.5)))
# the only other option in the `args_optim` list is the control parameters
# directly used by the optimization function (e.g., the maximum number of
# iterations); here we'll set the maximum number of iterations and function
# evaluations
fitex4 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1,
args_optim = list(control = list(iter.max = 1000,
eval.max = 1000)))