bild {bild} | R Documentation |
Fit of Parametric Models for Binary Longitudinal Data via Likelihood Method
Description
Performs the fit of parametric models via likelihood method. Serial dependence and random intercept are allowed according to the stochastic model chosen. Missing values and unbalanced data are automatically accounted for computing the likelihood function.
Usage
bild(formula = formula(data), data, time, id, subSET,
aggregate = FALSE, start = NULL, trace = FALSE,
dependence="ind", method = "BFGS",
control = bildControl(), integrate = bildIntegrate())
Arguments
formula |
a description of the model to be fitted of the form response~predictors |
data |
a |
time |
a string that matches the name of the |
id |
a string that matches the name of the |
subSET |
an optional expression indicating the subset of the rows of |
aggregate |
a string that permits the user identify the factor to be used in |
start |
a vector of initial values for the nuisance parameters of the likelihood. The dimension of the vector is according to the structure of the dependence model. |
trace |
logical flag: if TRUE, details of the nonlinear optimization are printed. By default the flag is set to FALSE. |
dependence |
expression stating which |
method |
The |
control |
a list of algorithmic constants for the optimizer |
integrate |
a list of algorithmic constants for the computation of a definite integral using a Fortran-77 subroutine. See "Details". |
Details
data
are contained in a data.frame
. Each element of the data
argument must be identifiable by a name.
The simplest situation occurs when all subjects are observed at the same time points.
The response variable represent the individual profiles of each subject, it is expected
a variable in the data.frame
that identifies the correspondence of each component of the response variable to the subject that it belongs,
by default is named id
variable. It is expected a variable named time
to be present in the data.frame
.
If the time
component has been given a different name, this should be declared.
The time
variable should identify the time points that each individual profile has been observed.
When it is expected that all subjects in one experiment to be observed at the same time points, but in practice some of the subjects were
not observed in some of the scheduled occasions, NA values can then be inserted in the response variable.
If a response profile is replicated several times, a variable called counts
must be created accordingly.
This vector is used for weighting the response profile indicating for each individual profile the number of times that is replicated.
The vector counts
must repeat the number of the observed replications of each individual profile as many times as the number of observed time
points for the correspondent profile. The program expect such vector to be named counts
.
If each profile has been observed only once, the construction of the vector counts
is not required.
subSET
is an optional expression indicating the subset of data
that should be
used in the fit. This is a logical statement of the type
variable 1
== "a" & variable 2
> x
which identifies the observations to be selected. All observations are included by default.
For the models with random intercept indR
, MC1R
and MC2R
,
bild
compute integrals based on a Fortran-77 subroutine package
QUADPACK
. For some data sets, when the dependence structure has
a random intercept term, the user could have the need to do a specification
of the integrate
argument list changing
the integration limits in the bildIntegrate
function.
The bildIntegrate
is an auxiliary function for controlling bild
fitting. See the example of locust
data.
Value
An object of class bild
.
Background
Assume that each subject of a given set has been observed at number of successive time points. For each subject and for each time point, a binary response variable, taking value 0 and 1, and a set of covariates are recorded. The underlying methodology builds a logistic regression model for the probability that the response variable takes value 1 as a function of the covariates, taking into account that successive observations from the same individual cannot be assumed to be independent.
The basic model for serial dependence is of Markovian type of the first order
(denoted MC1
here), suitably constructed so that the logistic regression
parameters maintain the same meaning as in ordinary logistic regression for
independent observations. The serial dependence parameter is the logarithm of
the odds-ratio between probabilities of adjacent observations, which is
assumed to be constant for all adjacent pairs, and it is denoted here
log.psi1
.
An extension of this formulation allows a Markovian dependence of the second
order, denoted MC2
here. In this case there are two parameters which
regulate serial dependence: log.psi1
as before and log.psi2
which is the analogous quantity for observations which are two time units apart,
conditionally on the intermediate value.
Individual random effects can be incorporated in the form of a random
intercept term of the linear predictor of the logistic regression,
assuming a normal distribution of mean 0 and variance \sigma^2
,
parameterized as \omega=\log(\sigma^2)
.
The combination of serial Markov dependence with a random intercept corresponds here
to the dependence structures MC1R
and MC2R
.
The combination of an independence structure with a random intercept is also allowed
setting the dependence structure to indR
.
Original sources of the above formulation are given by Azzalini (1994), as for the first order Markov dependence, and by Goncalves (2002) and Goncalves and Azzalini (2008) for the its extensions.
Author(s)
M. Helena Goncalves, M. Salome Cabral and Adelchi Azzalini
References
Azzalini, A. (1994). Logistic regression for autocorrelated data with application to repeated measures. Biometrika, 81, 767-775. Amendment: (1997) vol. 84, 989.
Goncalves, M. Helena (2002). Likelihood methods for discrete longitudinal data. PhD thesis, Faculty of Sciences, University of Lisbon.
Goncalves, M. Helena and Azzalini, A. (2008). Using Markov chains for marginal modelling of binary longitudinal data in an exact likelihood approach. Metron, vol LXVI, 2, 157-181.
Goncalves, M. Helena and Cabral, M. Salome and Azzalini, Adelchi (2012). The R Package bild
for the Analysis of Binary Longitudinal Data. Journal of Statistical Software, 46(9), 1-17.
See Also
bild-class
, bildControl
, bildIntegrate
, optim
Examples
## Are the examples used in respective dataset files
##### data= airpollution, dependence="MC2R"
str(airpollution)
air2r <- bild(wheeze~age+smoking, data=airpollution, trace=TRUE,
time="age", aggregate=smoking, dependence="MC2R")
summary(air2r)
getAIC(air2r)
getLogLik(air2r)
plot(air2r)
#### data=muscatine, dependence="MC2"
str(muscatine)
# we decompose the time effect in orthogonal components
muscatine$time1 <- c(-1, 0, 1)
muscatine$time2 <- c(1, -2, 1)
musc2 <- bild(obese~(time1+time2)*sex, data=muscatine,
time="time1", aggregate=sex, trace=TRUE, dependence="MC2")
summary(musc2)
getAIC(musc2)
getLogLik(musc2)