dynamiteformula {dynamite} | R Documentation |
Model formula for dynamite
Description
Defines a new observational or a new auxiliary channel for the model using
standard R formula syntax. Formulas of individual response variables can be
joined together via +
. See 'Details' and the package vignettes for more
information. The function obs
is a shorthand alias for dynamiteformula
,
and aux
is a shorthand alias for
dynamiteformula(formula, family = "deterministic")
.
Usage
dynamiteformula(formula, family, link = NULL)
obs(formula, family, link = NULL)
aux(formula)
## S3 method for class 'dynamiteformula'
e1 + e2
## S3 method for class 'dynamiteformula'
print(x, ...)
Arguments
formula |
[ |
family |
[ |
link |
[ |
e1 |
[ |
e2 |
[ |
x |
[ |
... |
Ignored. |
Details
Currently the dynamite package supports the following distributions for the observations:
Categorical:
categorical
(with a softmax link using the first category as reference). See the documentation of thecategorical_logit_glm
in the Stan function reference manual (https://mc-stan.org/users/documentation/).Multinomial:
multinomial
(softmax link, first category is reference).Gaussian:
gaussian
(identity link, parameterized using mean and standard deviation).Multivariate Gaussian:
mvgaussian
(identity link, parameterized using mean vector, standard deviation vector and the Cholesky decomposition of the correlation matrix).Poisson:
poisson
(log-link, with an optional known offset variable).Negative-binomial:
negbin
(log-link, using mean and dispersion parameterization, with an optional known offset variable). See the documentation onNegBinomial2
in the Stan function reference manual.Bernoulli:
bernoulli
(logit-link).Binomial:
binomial
(logit-link).Exponential:
exponential
(log-link).Gamma:
gamma
(log-link, using mean and shape parameterization).Beta:
beta
(logit-link, using mean and precision parameterization).Student t:
student
(identity link, parameterized using degrees of freedom, location and scale)
The models in the dynamite package are defined by combining the
channel-specific formulas defined via R formula syntax.
Each channel is defined via the obs
function, and the channels are
combined with +
. For example a formula
obs(y ~ lag(x), family = "gaussian") + obs(x ~ z, family = "poisson")
defines a model with two channels;
first we declare that y
is a gaussian variable depending on a previous
value of x
(lag(x)
), and then we add a second channel declaring x
as
Poisson distributed depending on some exogenous variable z
(for which we do not define any distribution).
Number of trials for binomial channels should be defined via a trials
model component, e.g., obs(y ~ x + trials(n), family = "binomial")
,
where n
is a data variable defining the number of trials. For multinomial
channels, the number of trials is automatically defined to be the sum
of the observations over the categories, but can also be defined using
the trials
component, for example for prediction.
Multivariate channels are defined by providing a single formula for all
components or by providing component-specific formulas separated by a |
.
The response variables that correspond to the components should be joined by
c()
. For instance, the following would define c(y1, y2)
as multivariate
gaussian with x
as a predictor for the mean of the first component and
x
and z
as predictors for the mean of the second component:
obs(c(y1, y2) ~ x | x + z, family = "mvgaussian")
. A multinomial channel
should only have a single formula.
In addition to declaring response variables via obs
, we can also use
the function aux
to define auxiliary channels which are deterministic
functions of other variables. The values of auxiliary variables are computed
dynamically during prediction, making the use of lagged values and other
transformations possible. The function aux
also does not use the
family
argument, which is automatically set to deterministic
and is a
special channel type of obs
. Note that lagged values of deterministic
aux
channels do not imply fixed time points. Instead they must be given
starting values using a special function init
that directly initializes
the lags to specified values, or by past
which computes the initial values
based on an R expression. Both init
and past
should appear on the
right hand side of the model formula, separated from the primary defining
expression via |
.
The formula within obs
can also contain an additional special
function varying
, which defines the time-varying part of the model
equation, in which case we could write for example
obs(x ~ z + varying(~ -1 + w), family = "poisson")
, which defines a model
equation with a constant intercept and time-invariant effect of z
, and a
time-varying effect of w
. We also remove the duplicate intercept with -1
in order to avoid identifiability issues in the model estimation
(we could also define a time varying intercept, in which case we would write
obs(x ~ -1 + z + varying(~ w), family = "poisson)
). The part of the formula
not wrapped with varying
is assumed to correspond to the fixed part of the
model, so obs(x ~ z + varying(~ -1 + w), family = "poisson")
is actually
identical to
obs(x ~ -1 + fixed(~ z) + varying(~ -1 + w), family = "poisson")
and
obs(x ~ fixed(~ z) + varying(~ -1 + w), family = "poisson")
.
When defining varying effects, we also need to define how the these
time-varying regression coefficient behave. For this, a splines
component
should be added to the model, e.g.,
obs(x ~ varying(~ -1 + w), family = "poisson) + splines(df = 10)
defines a
cubic B-spline with 10 degrees of freedom for the time-varying coefficient
corresponding to the w
. If the model contains multiple time-varying
coefficients, same spline basis is used for all coefficients, with unique
spline coefficients and their standard deviation.
If the desired model contains lagged predictors of each response in each
channel, these can be quickly added to the model as either time-invariant
or time-varying predictors via lags()
instead of writing them manually
for each channel.
It is also possible to define group-specific (random) effects term
using the special syntax random()
similarly as varying()
. For example,
random(~1)
leads to a model where in addition to the common intercept,
each individual/group has their own intercept with zero-mean normal prior and
unknown standard deviation analogously with the typical mixed models. An
additional model component random_spec()
can be used to define
whether the random effects are allowed to correlate within and across
channels and whether to use centered or noncentered parameterization for
the random effects.
Value
A dynamiteformula
object.
See Also
Model formula construction
dynamite()
,
lags()
,
lfactor()
,
random_spec()
,
splines()
Examples
data.table::setDTthreads(1) # For CRAN
# A single gaussian response channel with a time-varying effect of 'x',
# and a time-varying effect of the lag of 'y' using B-splines with
# 20 degrees of freedom for the coefficients of the time-varying terms.
obs(y ~ -1 + varying(~x), family = "gaussian") +
lags(type = "varying") +
splines(df = 20)
# A two-channel categorical model with time-invariant predictors
# here, lag terms are specified manually
obs(x ~ z + lag(x) + lag(y), family = "categorical") +
obs(y ~ z + lag(x) + lag(y), family = "categorical")
# The same categorical model as above, but with the lag terms
# added using 'lags'
obs(x ~ z, family = "categorical") +
obs(y ~ z, family = "categorical") +
lags(type = "fixed")
# A multichannel model with a gaussian, Poisson and a Bernoulli response and
# an auxiliary channel for the logarithm of 'p' plus one
obs(g ~ lag(g) + lag(logp), family = "gaussian") +
obs(p ~ lag(g) + lag(logp) + lag(b), family = "poisson") +
obs(b ~ lag(b) * lag(logp) + lag(b) * lag(g), family = "bernoulli") +
aux(numeric(logp) ~ log(p + 1))
data.table::setDTthreads(1) # For CRAN
obs(y ~ x, family = "gaussian") + obs(z ~ w, family = "exponential")
data.table::setDTthreads(1) # For CRAN
x <- obs(y ~ x + random(~ 1 + lag(d)), family = "gaussian") +
obs(z ~ varying(~w), family = "exponential") +
aux(numeric(d) ~ log(y) | init(c(0, 1))) +
lags(k = 2) +
splines(df = 5) +
random_spec(correlated = FALSE)
print(x)