ddm {fddm} | R Documentation |
Estimation of 5-Parameter DDM
Description
Fit the 5-parameter DDM (Diffusion Decision Model) via maximum likelihood estimation. The model for each DDM parameter can be specified symbolically using R's formula interface. With the exception of the drift rate (which is always estimated) all parameters can be either fixed or estimates.
Usage
ddm(
drift,
boundary = ~1,
ndt = ~1,
bias = 0.5,
sv = 0,
data,
optim = "nlminb",
args_optim = list(init = NULL, lo_bds = NULL, up_bds = NULL, control = list()),
args_ddm = list(err_tol = 1e-06),
use_gradient = TRUE,
compiled_model = TRUE,
model = TRUE,
mmatrix = TRUE,
response = TRUE,
na.action,
subset,
contrasts = NULL
)
Arguments
drift |
Two-sided formula. The left-hand side describes the response,
the right-hand side provides a symbolic description of the regression model
underlying the drift rate (v). The left-hand side needs to specify the
column in the data containing response time and corresponding binary
decision, concatenated by |
boundary , ndt , bias , sv |
Either a one-sided formula providing a symbolic description of the regression model or a scalar number given the value this parameter should be fixed to. Boundary separation (a), non-decision time (t0), relative initial bias (w), or inter-trial variability in the drift rate (sv) |
data , na.action , subset |
arguments controlling formula processing via
|
optim |
character string or fitting function indicating which numerical
optimization method should be used. The default |
args_optim |
named list of additional arguments for the optimization function. The available options are:
|
args_ddm |
named list of additional arguments passed to density function
calculation. Currently, the only option for this is |
use_gradient |
logical. Should gradient be used during numerical
optimization? Default is |
compiled_model , model , mmatrix , response |
logicals. If |
contrasts |
optional list. See the contrasts.arg of
|
Details
ddm
uses model.matrix
for transforming the
symbolic description of the regression model underlying each parameter into
estimated coefficients. The following provides a few examples:
-
~ 1
estimates a single coefficient, named(Intercept)
-
~ condition
estimates the intercept plus k - 1 coefficients for a factor with k levels (e.g., intercept plus one coefficient if condition has two levels). The interpretation of the coefficients depend on the factor contrasts employed, which are usually based on the contrasts specified inoptions("contrasts")
. For the defaulttreatment
contrasts (contr.treatment
), the intercept corresponds to the first factor level and the additional coefficients correspond to the difference from the intercept (i.e., first factor level). When usingcontr.sum
the intercept corresponds to the grand mean and the additional coefficients correspond to the differences from the grand mean. -
~ 0 + condition
estimates no intercept but one coefficient per factor level. This specification can also be used to get one coefficient per cell for a multi-factorial design, e.g.,~ 0 + condition1:condition2
. -
~ 0 + condition1 + condition1:condition2
estimates one "intercept" per level ofcondition1
factor (which is not called intercept) plus k - 1 difference parameters from the condition-specific intercept for the k-levels ofcondition2
. The interpretation of the difference parameters again depends on the contrasts used (e.g., treatment vs. sum-to-zero contrasts, see examples). This formula specification can often make sense for the drift rate whencondition1
is the factor (such as item type) mapped to upper and lower response boundary of the DDM andcondition2
is another factor by which we want the drift rate to differ. This essentially gives one overall drift rate per response boundary plus the differences from this overall one (note again that with treatment contrasts this overall drift rate is the first factor level ofcondition2
).
To get meaningful results it is necessary to estimate separate drift rates for the different condition/item-types that are mapped onto the upper and lower boundary of the diffusion model.
If a non-default fitting function is used, it needs to minimize the
negative log-likelihood, accept the following arguments,
init, objective, gradient, lower, upper, control
, and return a list
with the following arguments coefficients, loglik, converged, optim
(where converged
is boolean and optim
can be an arbitrary
list with additional information).
Value
Object of class ddm
(i.e., a list with components as listed
below) for which a number of common methods such as print
,
coef
, and logLik
are implemented, see
ddm-methods
.
-
coefficients
a named list whose elements are the values of the estimated model parameters -
dpar
a character vector containing the names of the estimated model parameters -
fixed_dpar
a named list whose elements are the values of the fixed model parameters -
loglik
the value of the log-likelihood function at the optimized parameter values -
hessians
a named list whose elements are the individual Hessians for each of the model parameters -
vcov
a named list whose elements are the individual variance-covariance matrices for each of the model parameters -
nobs
the number of observations in the data used for fitting -
npar
the number of parameters used to fit the model (i.e., the estimated model parameters plus any hyperparameters) -
df.residual
the residual degrees of freedom (the number of observations - the number of parameters) -
call
the original function call toddm
-
formula
the formulas used in the model (1
indicates that the model parameter was estimated with a single coefficient;0
indicates that the model parameter was fixed) -
dpar_formulas
a named list whose elements are the formulas for the model parameters (1
indicates that the model parameter was estimated with a single coefficient;0
indicates that the model parameter was fixed) -
na.action
na.action -
terms
a named list whose elements are the model parameters, except the last element is namedfull
and shows the breakdown of the model with all model parameters -
levels
a named list whose elements are the levels associated with any parameters that are factors (the elements areNULL
if the parameter is not a factor), and whose last element is namedFULL
and shows all of the levels used in the model -
contrasts
a named list whose elements are the type of contrasts used in the model -
args_ddm
a named list whose elements are the optional arguments used in the calculation of the DDM log-likelihood function -
link
a named list whose elements show information about the link function used for each model parameter (currently the only link function is the identity function) -
converged
a logical indicating whether the optimization converged (TRUE
) or not (FALSE
) -
optim_info
a named list whose elements are information about the optimization process (e.g., the name of the algorithm used, the final value of the objective function, the number of evaluations of the gradient function, etc.) -
model
the data used in the model (might need to check this) -
response
the response data used in the model -
mmatrix
a named list whose elements are the model matrices for each of the estimated parameters -
compiled_model
C++ object that contains the compiled model (see list below for more details)
The C++ object accessible via the compiled_model
component of the
above R object of class ddm
contains the following components:
-
rt
a numeric vector of the response time data used in the model -
response
an integer vector of the response data used in the model (coded such that1
corresponds to the "lower" boundary and2
corresponds to the "upper" boundary) -
err_tol
the error tolerance used in the calculations for fitting the DDM -
coefficients
a numeric vector containing the current set of coefficients for the formulas provided to theddm()
function call; the coefficients correspond to the DDM parameters in the following order:v
,a
,t0
,w
,sv
-
likelihood
a double containing the log-likelihood for the current set ofcoefficients
(note this can be changed by calling the functioncalculate_loglik()
below) -
modmat_v
a numeric matrix containing the model matrix forv
, the drift rate, determined by the formula input to the argumentdrift
in theddm()
function call -
modmat_a
a numeric matrix containing the model matrix fora
, the boundary separation, determined by the formula input to the argumentboundary
in theddm()
function call -
modmat_t0
a numeric matrix containing the model matrix fort0
, the non-decision time, determined by the formula input to the argumentndt
in theddm()
function call -
modmat_w
a numeric matrix containing the model matrix forw
, the inital bias, determined by the formula input to the argumentbias
in theddm()
function call -
modmat_sv
a numeric matrix containing the model matrix forsv
, the inter-trial variability in the drift rate, determined by the formula input to the argumentsv
in theddm()
function call -
hess_v
a numeric matrix containing the Hessian forv
, the drift rate, whose dimensions are determined by the formula input to the argumentdrift
in theddm()
function call -
hess_a
a numeric matrix containing the Hessian fora
, the boundary separation, whose dimensions are determined by the formula input to the argumentdrift
in theddm()
function call -
hess_t0
a numeric matrix containing the Hessian fort0
, the non-decision time, whose dimensions are determined by the formula input to the argumentdrift
in theddm()
function call -
hess_w
a numeric matrix containing the Hessian forw
, the initial bias, whose dimensions are determined by the formula input to the argumentdrift
in theddm()
function call -
hess_sv
a numeric matrix containing the Hessian forsv
, the inter-trial variability in the drift rate, whose dimensions are determined by the formula input to the argumentdrift
in theddm()
function call -
vcov_v
a numeric matrix containing the variance-covariance matrix forv
, the drift rate, whose dimensions are determined by the formula input to the argumentdrift
in theddm()
function call -
vcov_a
a numeric matrix containing the variance-covariance matrix fora
, the boundary separation, whose dimensions are determined by the formula input to the argumentboundary
in theddm()
function call -
vcov_t0
a numeric matrix containing the variance-covariance matrix fort0
, the non-decision time, whose dimensions are determined by the formula input to the argumentndt
in theddm()
function call -
vcov_w
a numeric matrix containing the variance-covariance matrix forw
, the inital bias, whose dimensions are determined by the formula input to the argumentbias
in theddm()
function call -
vcov_sv
a numeric matrix containing the variance-covariance matrix forv
, the inter-trial variability in the drift rate, whose dimensions are determined by the formula input to the argumentsv
in theddm()
function call -
calculate_loglik
calculates and returns a double containing the negated log-likelihood (note that this will overwrite thelikelihood
component of the C++ object) -
calculate_gradient
calculates and returns a numeric vector of the negated gradients for the provided coefficient values; the gradients are stored in the same manner as their correspondingcoefficents
(note that this will overwrite thelikelihood
) component of the C++ object) -
calculate_hessians
calculates and returns a named list of the negated Hessians for each model parameter for the provided coefficient values (note that this will overwrite thelikelihood
,hess_v
,hess_a
,hess_t0
,hess_w
, andhess_sv
components of the C++ object) -
calculate_vcov
calculates and returns a named list of the variance-covariance matrices for each model parameter for the storedcoefficients
(note that this will overwrite thelikelihood
,hess_v
,hess_a
,hess_t0
,hess_w
,hess_sv
,vcov_v
,vcov_a
,vcov_t0
,vcov_w
, andvcov_sv
components of the C++ object) -
calculate_standard_error
calculates and returns a numeric vector of the standard errors of the storedcoefficients
; the standard errors are stored in the same manner as their correspondingcoefficients
(note that this will overwrite thelikelihood
,hess_v
,hess_a
,hess_t0
,hess_w
,hess_sv
,vcov_v
,vcov_a
,vcov_t0
,vcov_w
, andvcov_sv
components of the C++ object)
Examples
# prepare data
data(med_dec, package = "fddm")
med_dec <- med_dec[which(med_dec[["rt"]] >= 0), ] ## only use valid RTs
## select data from one participant
p1 <- med_dec[med_dec[["id"]] == 2 & med_dec[["group"]] == "experienced", ]
head(p1)
##----------------------------------------------
## Easiest: Fitting using emmeans -
##----------------------------------------------
## Because we use an ANOVA approach, we set orthogonal sum-to-zero contrasts
op <- options(contrasts = c('contr.sum', 'contr.poly'))
fit0 <- ddm(rt + response ~ classification*difficulty, data = p1)
summary(fit0)
## for more tests, we use emmeans:
if (requireNamespace("emmeans")) {
# for ANOVA table:
emmeans::joint_tests(fit0)
# get conditional main effects of difficulty (for each level of classification):
emmeans::joint_tests(fit0, by = "classification")
# get mean drift rates per condition:
em1 <- emmeans::emmeans(fit0, "difficulty", by = "classification")
em1
# compare mean drift rates per condition
pairs(em1)
update(pairs(em1), by = NULL, adjust = "holm")
}
options(op) # reset contrasts
##----------------------------------------------------------------
## Fitting with custom parametrisation -
##----------------------------------------------------------------
## one drift rate per classification by difficulty design cell
fit1 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1)
summary(fit1)
## set default contrasts (just in case contrasts have been changed)
op <- options(contrasts = c('contr.treatment', 'contr.poly'))
## one drift rate "intercept" per classification condition (blast vs. non-blast)
## corresponding to first level of difficulty factor (easy)
## plus one further coefficient per classification condition corresponding to
## difference from "intercept" (hard - easy)
fit1b <- ddm(rt + response ~ 0 + classification + classification:difficulty,
data = p1, args_optim = list(control = list(iter.max = 1000,
eval.max = 1000)))
summary(fit1b)
options(op) # reset contrasts
## set orthogonal sum-to-zero contrasts
op <- options(contrasts = c('contr.sum', 'contr.poly'))
## one drift rate "intercept" per classification condition (blast vs. non-blast)
## corresponding to mean drift rate for the classification condition
## plus one further coefficient per classification condition corresponding to
## difference from "intercept" (hard/easy - mean drift rate)
fit1c <- ddm(rt + response ~ 0 + classification + classification:difficulty,
data = p1)
summary(fit1c)
options(op) ## reset contrasts
## all variants produce same fit, only meaning of parameters differs
logLik(fit1)
logLik(fit1b)
logLik(fit1c)
logLik(fit0) ## also model above
## all models estimate same drift rates, but in different parametrisation:
coef(fit1) ## drift rates per design cell
## same drift rates based on fit1b:
c(coef(fit1b)[1:2],
coef(fit1b)[1] + coef(fit1b)[3], coef(fit1b)[2] + coef(fit1b)[4])
## same drift rates based on fit1c:
c(coef(fit1c)[1] + coef(fit1c)[3], coef(fit1c)[2] + coef(fit1c)[4],
coef(fit1c)[1] - coef(fit1c)[3], coef(fit1c)[2] - coef(fit1c)[4])
# we can estimate a model that freely estimates response bias
# (instead of fixing it at 0.5)
fit2 <- ddm(rt + response ~ 0 + classification:difficulty, bias = ~1, data = p1)
summary(fit2)
## Note: estimating bias only makes sense in situations like here where the
## response boundaries do not correspond to correct/incorrect but to the
## actual responses participants gave (here: blast vs. non-blast classification)
## now let's perform a likelihood ratio test to check if estimating response
## bias freely leads to a significant increase in model fit?
if (requireNamespace("lmtest")) { ## requires package lmtest
lmtest::lrtest(fit1, fit2)
## does not look like it (p = 0.1691)
}
# we can also make a DDM parameter, such as boundary, depend on a numeric
# variable, such as block number
fit3 <- ddm(rt + response ~ 0 + classification:difficulty,
boundary = ~ block, data = p1)
summary(fit3)
## does making boundary depend on block leads to a significant increase in model
## fit?
if (requireNamespace("lmtest")) { ## requires package lmtest
lmtest::lrtest(fit1, fit3)
## does not look like it (p = 0.198)
}
##----------------------------------------------------------------
## Fitting with optimization arguments -
##----------------------------------------------------------------
## example of how to use your own initial values and bounds for the optimization
## of the coefficients (determined by the model matrices/formulas)
options(op) # reset contrasts
# this uses the default generated initial values and bounds
fitex0 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1)
# we can see the number of coefficients (and thus the number of initial values
# and bounds) required if we wish to use our own
fitex0$optim_info$args_optim$init
fitex0$optim_info$args_optim$lo_bds
fitex0$optim_info$args_optim$up_bds
# -1.9270279 2.5408864 -1.9270279 0.3181987 1.7907438 0.1264035
# -Inf -Inf -Inf -Inf 0 0
# Inf Inf Inf Inf Inf 0.461
# the first four coefficients are for the drift rate (given by the formula we
# provided), and the last two are for the boundary separation and non-decision
# time, respectively (note that the default is to estimate these two model
# parameters with a single coefficient).
# to use our own initial values, we can include them in the `args_optim` list,
# but note that they must be within the generated bounds
fitex1 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1,
args_optim = list(init = c(-1.5, 1, -1, 1, 1.5, 0.3)))
# to use our own bounds, we again can include them in the `args_optim` list,
# but note that they must contain the generated initial values
fitex2 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1,
args_optim = list(lo_bds = c(-20, -20, -20, -20, 0, 0),
up_bds = c(20, 20, 20, 20, 10, 0.5)))
# to use both our own initial values and bounds, we include them in the
# `args_optim` list, and the bounds must contain the initial values
fitex3 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1,
args_optim = list(init = c(-1.5, 1, -1, 1, 1.5, 0.3),
lo_bds = c(-20, -20, -20, -20, 0, 0),
up_bds = c(20, 20, 20, 20, 10, 0.5)))
# the only other option in the `args_optim` list is the control parameters
# directly used by the optimization function (e.g., the maximum number of
# iterations); here we'll set the maximum number of iterations and function
# evaluations
fitex4 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1,
args_optim = list(control = list(iter.max = 1000,
eval.max = 1000)))