trim {rtrim} | R Documentation |
Estimate TRIM model parameters.
Description
Given some count observations, estimate a TRIM model and use these to impute the data set if necessary.
Usage
trim(object, ...)
## S3 method for class 'data.frame'
trim(
object,
count_col = "count",
site_col = "site",
year_col = "year",
month_col = NULL,
weights_col = NULL,
covar_cols = NULL,
model = 2,
changepoints = ifelse(model == 2, 1L, integer(0)),
overdisp = FALSE,
serialcor = FALSE,
autodelete = TRUE,
stepwise = FALSE,
covin = list(),
...
)
## S3 method for class 'formula'
trim(object, data = NULL, weights = NULL, ...)
## S3 method for class 'trimcommand'
trim(object, ...)
Arguments
object |
Either a |
... |
More parameters, see below in the details |
count_col |
|
site_col |
|
year_col |
|
month_col |
|
weights_col |
|
covar_cols |
|
model |
|
changepoints |
|
overdisp |
|
serialcor |
|
autodelete |
|
stepwise |
|
covin |
a list of variance-covariance matrices; one per pseudo-site. |
data |
|
weights |
|
Details
All versions of trim
support additional 'experts only' arguments:
verbose
Logical switch to temporarily enable verbose output. (use
option(trim_verbose=TRUE)
) for permanent verbosity.constrain_overdisp
Numerical value to control overdispersion.
A value in the range 0..1 uses a Chi-squared oulier detection method.
A value >1 uses Tukey's Fence.
A value of 1.0 (which is the default) results in unconstrained overdispersion.
See vignette ‘Taming overdispersion’ for more information.
conv_crit
Convergence criterion. Used within the iterative model estimation algorithm. The default value is
1e-5
.). May be set to higher values in case models don't converge.max_iter
Number of iterations. Default value is
200
. May be set to higher values in case models don't converge.alpha_method
Choose between a more precise (method 1) or a more robust (method 2) method to estimate site parameters alpha. The default is the the more precise method; but consider setting it to the more robust method 2 if method results in warnings.
premove
Probability of removal of changepoints (default value: 0.2). Parameter used in stepwise refinement of models. See the vignette 'Models and statistical methods in rtrim'.
penter
Probability of re-entering of changepoints (default value: 0.15). Similar use as
premove
.
Models
The purpose of trim()
is to estimate population totals over time,
based on a set of counts f_{ij}
at sites i=1,2,\ldots,I
and times j=1,2,\ldots,J
. If no count data is available at
site and time (i,j)
, a value \mu_{ij}
will be imputed.
In Model 1, the imputed values are modeled as
\ln\mu_{ij} = \alpha_i,
where \alpha_i
is the site effect. This model implies that the counts
vary accross sites, not over time. The model-based time totals are equal to
each time point and the model-based indices are all equal to one.
In Model 2, the imputed values are modeled as
\ln\mu_{ij} = \alpha_i + \beta\times(j-1).
Here, \alpha_i
is the log-count of site i
averaged over time and
\beta
is the mean growth factor that is shared by all sites over all of
time. The assumption of a constant growth rate may be relaxed by passing
a number of changepoints
that indicate at what times the growth
rate is allowed to change. Using a wald
test
one can investigate whether the changes in slope at the changepoints are
significant. Setting stepwise=TRUE
makes trim
automatically
remove changepoints where the slope does not change significantly.
In Model 3, the imputed values are modeled as
\ln\mu_{ij}=\alpha_i + \beta_j
,
where \beta_j
is the deviatiation of log-counts at time j
,
averaged over all sites. To make this model identifiable, the value of
\beta_1=0
by definition. Model 3 can be shown to be equivalent to
Model 2 with a changepoint at every time point. Using a
wald
test, one can estimate whether the collection
of deviations \beta_i
make the model differ significantly from an
overall linear trend (Model 2 without changepoints).
The parameters \alpha_i
and \gamma_j
are referred to
as the additive representation of the coefficients. Once computed,
they can be represented and extracted in several representations, using the
coefficients
function. (See also the examples
below).
Other model parameters can be extracted using functions such as
gof
(for goodness of fit), summary
or totals
. Refer to the ‘See also’ section for an overview.
Using yearly and monthly counts
In many data sets will use use only yearly count data, in which case the
time j
will reflect the year number.
An extension of trim
is to use monthly (or any other sub-yearly) count data,
in combination with index computations on the yearly time scale.
In this case, counts are given as f_{i,j,m}
with m=1,2,\ldots,M
the month number.
As before, \mu_{i,j,m}
will be imputed in case of missing counts.
The contibution of month factors to the model is always similar to the way year factors are used in Model 3, that is,
\ln\mu_{i,j,m} = \alpha_i + \beta\times(j-1) + \gamma_m
for Model 2, and
\ln\mu_{i,j,m} = \alpha_i + \beta_j + \gamma_m
for Model 3.
For the same reason why \beta_1=0
for Model 3, \gamma_1=0
in case of monthly parameters.
Using covariates
In the basic case of Models 2 and 3, the growth parameter \beta
does
not vary accross sites. If auxiliary information is available (for instance
a classification of the type of soil or vegetation), the effect of these
variables on the per-site growth rate can be taken into account.
For Model 2 with covariates the growth factor \beta
is
replaced with a factor
\beta_0 + \sum_{k=1}^K z_{ijk}\beta_k
.
Here, \beta_0
is referred to as the baseline and z_{ijk}
is a
dummy variable that combines dummy variables for all covariates. Since a
covariate with L
classes is modeled by L-1
dummy variables, the
value of K
is equal to the sum of the numbers of categories for all
covariates minus the number of covariates. Observe that this model allows for
a covariate to change over time at a certain sites. It is therefore possible
to include situations for example where a site turns from farmland to rural
area. The coefficients
function will report every
individual value of \beta
. With a wald
test,
the significance of contributions of covariates can be tested.
For Model 3 with covariates the parameter \beta_j
is replaced by
\beta_{j0} + \sum_{k=1}^Kz_{ijk}\beta_{jk}.
Again, the \beta_{j0}
are referred to as baseline parameters and the
\beta_{jk}
record mean differences in log-counts within a set of sites
with equal values for the covariates. All coefficients can be extracted with
coefficients
and the significance of covariates can
be investigated with the wald
test.
Estimation options
In the simplest case, the counts at different times and sites are considered
independently Poisson distributed. The (often too strict) assumption that
counts are independent over time may be dropped, so correlation between time
points at a certain site can be taken into account. The assumption of being
Poisson distributed can be relaxed as well. In general, the
variance-covariance structure of counts f_{ij}
at site i
for time
j
is modeled as
\textrm{var}(f_{ij}) = \sigma^2\mu_{ij}
\textrm{cor}(f_{ij},f_{i,j+1}) = \rho
,
where \sigma
is called the overdispersion, \mu_{ij}
is
the estimated count for site i
, time j
and \rho
is called
the serial correlation.
If \sigma=1
, a pure Poisson distribution is assumed to model the
counts. Setting overdispersion = TRUE
makes trim
relax this
condition. Setting serialcor=TRUE
allows trim
to assume a
non-zero correlation between adjacent time points, thus relaxing the
assumption of independence over time.
Demands on data
The data set must contain sufficient counts to be able to estimate the model. In particular
For model 2 without covariates there must be at least one observation for each time segment defined by the change points.
For model 2 with covariates there must be at least one observation for every value of each covariate, at each time segment defined by the change points.
For model 3 without covariates there must be at least one observation for each time point.
For model 3 with covariates there must be at least one observation for every value of each covariate, at each time point.
For montly data, there must be at least one observation for every month.
The function check_observations
identifies cases where too few
observations are present to compute a model. Setting the option
autodelete=TRUE
(Model 2 only) makes trim
remove changepoints
such that at each time piece sufficient counts are available to estimate the
model.
See Also
rtrim by example for a gentle introduction, rtrim for TRIM users for users of the classic Delphi-based TRIM implementation, and rtrim 2 extensions for the major changes from rtrim v.1 to rtrim v.2
Other analyses:
coef.trim()
,
confint.trim()
,
gof()
,
index()
,
now_what()
,
overall()
,
overdispersion()
,
plot.trim.index()
,
plot.trim.overall()
,
plot.trim.smooth()
,
results()
,
serial_correlation()
,
summary.trim()
,
totals()
,
trendlines()
,
vcov.trim()
,
wald()
Other modelspec:
check_observations()
,
read_tcf()
,
read_tdf()
,
set_trim_verbose()
,
trimcommand()
Examples
data(skylark)
m <- trim(count ~ site + time, data=skylark, model=2)
summary(m)
coefficients(m)
# An example using weights
# set up some random weights (one for each site)
w <- runif(55, 0.1, 0.9)
# match weights to sites
skylark$weights <- w[skylark$site]
# run model
m <- trim(count ~ site + time, data=skylark, weights="weights", model=3)
# An example using change points, a covariate, and overdispersion
# 1 is added as cp automatically
cp <- c(2,6)
m <- trim(count ~ site + time + Habitat, data=skylark, model=2, changepoints=cp, overdisp=TRUE)
coefficients(m)
# check significance of changes in slope
wald(m)
plot(overall(m))