PSFormula {PStrata} | R Documentation |
Set up a model formula for use in PStrata
Description
Set up a model formula for use in PStrata package allowing users to specify the treatment indicator, the post-randomization confounding variables, the outcome variable, and possibly the covariates. For survival outcome, a censoring indicator is also specified. Users can also define (potentially non-linear) transforms of the covariates and include random effects for clusters.
Usage
PSFormula(formula, data)
Arguments
formula |
an object of class |
data |
a data frame containing the variables named in |
Details
Two models are required for the principal stratification analysis: the principal stratum model and the outcome model.
General formula structure
For the principal stratum model, the formula
argument accepts formulas of the following syntax:
treatment + postrand ~ terms
The treatment
variable refers to the name of the binary treatment indicator.
The postrand
variable refers to the name of the binary post-randomization confounding variable.
The terms
part includes all of the predictors used for the principal stratum model.
For the outcome model, the formula
argument accepts formulas of the similar syntax:
response [+ observed] ~ terms
The response
variable refers to the name of the outcome variable.
The terms
part includes all of the predictors used for the outcome model.
The observed
variable shall not be used for ordinary response.
When the true response is subject to right censoring (also called survival outcome in relevant literature),
the response
variable should refer to the observed or censored response, and the observed
variable should
be an indicator of whether the true response is observed.
For example, suppose the true time for an event is T
and the time of censoring is C
,
Then, the response
variable should refer to \min(T, C)
, the actual time of the event or censoring, whichever comes earlier,
and the indicator observed
is 1 if T < C
and 0 otherwise.
The terms
specified in the principal stratum model and the outcome model can be different.
Multiple post-randomization confounding variables
If multiple post-randomization confounding variables exist, one can specify all of them using the following syntax:
treatment + postrand_1 + postrand_2 + ... + postrand_n ~ terms
The post-randomization confounding variables are provided in place of postrand_1
to
postrand_n
. Up to this version, all of these variables should be binary indicators.
Note that the order of these post-randomization confounding variables will not
affect the result of the estimation of the parameters, but it will be important
in specifying other parameters, such as strata
and ER
(see PStrata
).
Non-linear transformation of the predictors
The syntax for the predictors follow the conventions as used in link{formula}
.
The part terms
consists of a series of terms concatenated by +
,
each term being the name of a variable, or the interaction of several variables separated by :
.
Apart from +
and :
, a number of other operators are also useful.
The *
operator is a short-hand for factor crossing:
a*b
is interpreted as a + b + a:b
.
The ^
operator means factor crossing to a specific degree. For example,
(a + b + c)^2
is interpreted as (a + b + c) * (a + b + c)
,
which is identical to a + b + c + a:b + a:c + b:c
.
The -
operator removes specified terms, so that (a + b + c)^2 - a:b
is
identical to a + b + c + a:c + b:c
.
The -
operator can be also used to remove the intercept term, such as
x - 1
. One can also use x + 0
to remove the intercept term.
Arithmetic expressions such as a + log(b)
are also legal.
However, arithmetic expressions may contain special symbols that are defined for other use, such as +
, *
, ^
and -
.
To avoid confusion, the function I()
can be used to bracket portions where the operators should be interpreted in arithmetic sense.
For example, in x + I(y + z)
, the term y + z
is interpreted as the sum of y
and z
.
Group level random effect
When effects assumed to vary across grouping variables are considered, one can
specify such effects by adding terms in the form of gterms | group
, where
group
refers to the group indicator (usually a factor
), and
gterms
specifies the terms whose coefficients are group-specific, drawn
from a population normal distribution.
The most common situation for group level random effect is to include group-specific
intercepts to account for unmeasured confounding.
For example, x + y + (1 | g)
specifies a model with population predictors
x
and y
, as well as random intercept for each level of g
.
For more complex random effect structures, refer to lme4::lmer
.
However, structures other than simple random intercepts and slopes may lead to unexpected behaviors.
Value
PSFormula
returns an object of class PSFormula
,
which is a list
containing for following components.
full_formula
input formula as is
data
input data frame
fixed_eff_formula
input formula with only fixed effects
response_names
character vector with names of variables that appear on the left hand side of input formula
has_random_effect
logical indicating whether random effects are specified in the input formula
has_intercept
logical indicating whether the input formula has an intercept
fixed_eff_names
character vector with names of all variables included as fixed effects
fixed_eff_count
integer indicating the number of variables (factors are converted to and counted as dummy variables)
fixed_eff_matrix
fixed-effect design matrix
random_eff_list
a list containing information for each random effect. Such information is a list with the corresponding design matrix, the term names and the factor levels.
See Also
Examples
df <- data.frame(
X = 1:10,
Z = c(0,0,0,0,0,1,1,1,1,1),
D = c(0,0,0,1,1,1,0,0,1,1),
R = c(1,1,1,1,2,2,2,3,3,3)
)
PSFormula(Z + D ~ X + I(X^2) + (1 | R), df)